[jira] [Commented] (IMPALA-9150) Restarting minicluster breaks HBase on CDH GBN 1582079

2019-11-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980663#comment-16980663
 ] 

ASF subversion and git services commented on IMPALA-9150:
-

Commit 4c09975c14f624028100e9940526a111897846cb in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=4c09975 ]

IMPALA-9165: Add back hard kill to kill-hbase.sh

The fix for IMPALA-9150 changed kill-hbase.sh to use HBase's
stop-hbase.sh script. Around this time, the GVO timeout issues
started. GVO can reuse machines, so we don't know what state
they may be in. If something failed to kill HBase processes,
the next job would need to be able to kill them even without
access to the last run's files / logs.

This restores the original kill logic to kill-hbase.sh, after
trying a graceful shutdown using HBase's stop-hbase.sh script.
The original kill logic doesn't rely on anything from the
filesystem to know about the existence of processes, so it
would handle machine reuse.

This also changes our Jenkins test scripts to shut down the
minicluster at the end.

Testing:
 - Started with a running minicluster, ran bin/clean.sh,
   then ran testdata/bin/kill-all.sh and verified that the
   java processes were gone

Change-Id: Ie2f0b342bcd1d8abea8ef923adbb54a14518a7a6
Reviewed-on: http://gerrit.cloudera.org:8080/14789
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Restarting minicluster breaks HBase on CDH GBN 1582079
> --
>
> Key: IMPALA-9150
> URL: https://issues.apache.org/jira/browse/IMPALA-9150
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.4.0
>Reporter: Joe McDonnell
>Priority: Blocker
> Fix For: Impala 3.4.0
>
>
> On the most recent CDH GBN (1582079), restarting HBase using our normal 
> scripts (testdata/bin/kill-hbase.sh / testdata/bin/run-hbase.sh) results in 
> an unusable HBase. Our testdata/bin/kill-hbase.sh script use the 
> kill-java-service.sh script:
> {code:java}
> "$DIR"/kill-java-service.sh -c HRegionServer -c HMaster -c HQuorumPeer -s 2
> {code}
> This kills the region servers before the master. On CDH GBN 1582079, the 
> master gets unhappy:
> {noformat}
> 19/11/10 16:40:17 INFO master.RegionServerTracker: RegionServer ephemeral 
> node deleted, processing expiration [localhost,16022,1573402351656]
> 19/11/10 16:40:17 INFO master.ServerManager: Processing expiration of 
> localhost,16022,1573402351656 on localhost,16000,1573402349553
> ... same for other region servers ...
> 19/11/10 16:40:17 INFO procedure.ServerCrashProcedure: Start pid=102, 
> state=RUNNABLE:SERVER_CRASH_START, locked=true; ServerCrashProcedure 
> server=localhost,16022,1573402351656, splitWal=true, meta=false
> ... same for other region servers ...
> 19/11/10 16:40:17 INFO master.SplitLogManager: 
> hdfs://localhost:20500/hbase/WALs/localhost,16023,1573402352683-splitting dir 
> is empty, no logs to split.
> 19/11/10 16:40:17 INFO master.SplitLogManager: Finished splitting (more than 
> or equal to) 0 (0 bytes) in 0 log files in 
> [hdfs://localhost:20500/hbase/WALs/localhost,16023,1573402352683-splitting] 
> in 0ms
> ... more stuff ...
> 19/11/10 16:40:17 ERROR procedure2.ProcedureExecutor: CODE-BUG: Uncaught 
> runtime exception: pid=102, state=RUNNABLE:SERVER_CRASH_ASSIGN, locked=true; 
> ServerCrashProcedure server=localhost,16022,1573402351656, splitWal=true, 
> meta=false19/11/10 16:40:17 ERROR procedure2.ProcedureExecutor: CODE-BUG: 
> Uncaught runtime exception: pid=102, state=RUNNABLE:SERVER_CRASH_ASSIGN, 
> locked=true; ServerCrashProcedure server=localhost,16022,1573402351656, 
> splitWal=true, meta=falsejava.lang.NullPointerException at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.createAssignProcedures(AssignmentManager.java:646)
>  at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.createRoundRobinAssignProcedures(AssignmentManager.java:601)
>  at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.createRoundRobinAssignProcedures(AssignmentManager.java:571)
>  at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:188)
>  at 
> org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:59)
>  at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:189)
>  at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:965) at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1742)
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1481)
>  at 
> 

[jira] [Commented] (IMPALA-9188) Dataload is failing when USE_CDP_HIVE=true

2019-11-22 Thread Anurag Mantripragada (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980636#comment-16980636
 ] 

Anurag Mantripragada commented on IMPALA-9188:
--

I think the bug is at this line: 
[https://github.com/apache/impala/blob/e716e76cccf59c2780571429b1b945d6bbc61b8d/fe/src/main/java/org/apache/impala/analysis/TableDef.java#L497]

For a composite primary key like (id, year) we are generating unique constraint 
names for each column whereas, they should have the same constraint name. In 
Hive, the comparator first sorts using constraint name and then key_seq if 
constraint names are same.This is why the hive comparator is giving different 
results. We should generate a new name only if key_seq is 1, if not, we should 
use existing constraint name. We already do something similar for foreign keys.

[https://github.com/apache/impala/blob/e716e76cccf59c2780571429b1b945d6bbc61b8d/fe/src/main/java/org/apache/impala/analysis/TableDef.java#L565]

> Dataload is failing when USE_CDP_HIVE=true
> --
>
> Key: IMPALA-9188
> URL: https://issues.apache.org/jira/browse/IMPALA-9188
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Anurag Mantripragada
>Priority: Critical
>
> When USE_CDP_HIVE=true, Impala builds are failing during dataload when 
> creating tables with PK/FK constraints.
> The error is:
> {code:java}
> ERROR: CREATE EXTERNAL TABLE IF NOT EXISTS 
> functional_seq_record_snap.child_table (
> seq int, id int, year string, a int, primary key(seq) DISABLE NOVALIDATE 
> RELY, foreign key
> (id, year) references functional_seq_record_snap.parent_table(id, year) 
> DISABLE NOVALIDATE
> RELY, foreign key(a) references functional_seq_record_snap.parent_table_2(a) 
> DISABLE
> NOVALIDATE RELY)
> row format delimited fields terminated by ','
> LOCATION '/test-warehouse/child_table'
> Traceback (most recent call last):
>   File "Impala/bin/load-data.py", line 208, in exec_impala_query_from_file
> result = impala_client.execute(query)
>   File "Impala/tests/beeswax/impala_beeswax.py", line 187, in execute
> handle = self.__execute_query(query_string.strip(), user=user)
>   File "Impala/tests/beeswax/impala_beeswax.py", line 362, in __execute_query
> handle = self.execute_query_async(query_string, user=user)
>   File "Impala/tests/beeswax/impala_beeswax.py", line 356, in 
> execute_query_async
> handle = self.__do_rpc(lambda: self.imp_service.query(query,))
>   File "Impala/tests/beeswax/impala_beeswax.py", line 519, in __do_rpc
> raise ImpalaBeeswaxException(self.__build_error_message(b), b)
> ImpalaBeeswaxException: ImpalaBeeswaxException:
>  INNER EXCEPTION: 
>  MESSAGE: ImpalaRuntimeException: Error making 'createTable' RPC to Hive 
> Metastore:
> CAUSED BY: MetaException: Foreign key references id:int;year:string; but no 
> corresponding primary key or unique key exists. Possible keys: 
> [year:string;id:int;]{code}
> The corresponding error in HMS is:
> {code:java}
> 2019-11-22T06:36:59,937  INFO [pool-10-thread-13] metastore.HiveMetaStore: 
> 18: source:127.0.0.1 create_table_req: Table(tableName:child_table, 
> dbName:functional_seq_record_gzip, owner:jenkins, createTime:0, 
> lastAccessTime:0, retention:0, 
> sd:StorageDescriptor(cols:[FieldSchema(name:seq, type:int, comment:null), 
> FieldSchema(name:id, type:int, comment:null), FieldSchema(name:year, 
> type:string, comment:null), FieldSchema(name:a, type:int, comment:null)], 
> location:hdfs://localhost:20500/test-warehouse/child_table, 
> inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
> compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:null, 
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> parameters:{serialization.format=,, field.delim=,}), bucketCols:null, 
> sortCols:null, parameters:null), partitionKeys:[], parameters:{EXTERNAL=TRUE, 
> OBJCAPABILITIES=EXTREAD,EXTWRITE}, viewOriginalText:null, 
> viewExpandedText:null, tableType:EXTERNAL_TABLE, catName:hive, 
> ownerType:USER, accessType:8)
> 2019-11-22T06:36:59,937  INFO [pool-10-thread-13] HiveMetaStore.audit: 
> ugi=jenkins  ip=127.0.0.1cmd=source:127.0.0.1 create_table_req: 
> Table(tableName:child_table, dbName:functional_seq_record_gzip, 
> owner:jenkins, createTime:0, lastAccessTime:0, retention:0, 
> sd:StorageDescriptor(cols:[FieldSchema(name:seq, type:int, comment:null), 
> FieldSchema(name:id, type:int, comment:null), FieldSchema(name:year, 
> type:string, comment:null), FieldSchema(name:a, type:int, comment:null)], 
> location:hdfs://localhost:20500/test-warehouse/child_table, 
> inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
> 

[jira] [Commented] (IMPALA-9188) Dataload is failing when USE_CDP_HIVE=true

2019-11-22 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980624#comment-16980624
 ] 

Sahil Takiar commented on IMPALA-9188:
--

Yeah, switching the order fixes it. Something odd is going on though. In Hive, 
the order must be {{foreign key(id, year) references parent_table(id, year)}} 
but in Impala the order must be {{foreign key(year, id) references 
parent_table(year, id)}}.

I would assume the order should be the same regardless of the engine used, so 
there is probably a bug somewhere. Regardless, assuming just switching the 
order fixes the issue we can hold off on the revert.

> Dataload is failing when USE_CDP_HIVE=true
> --
>
> Key: IMPALA-9188
> URL: https://issues.apache.org/jira/browse/IMPALA-9188
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Anurag Mantripragada
>Priority: Critical
>
> When USE_CDP_HIVE=true, Impala builds are failing during dataload when 
> creating tables with PK/FK constraints.
> The error is:
> {code:java}
> ERROR: CREATE EXTERNAL TABLE IF NOT EXISTS 
> functional_seq_record_snap.child_table (
> seq int, id int, year string, a int, primary key(seq) DISABLE NOVALIDATE 
> RELY, foreign key
> (id, year) references functional_seq_record_snap.parent_table(id, year) 
> DISABLE NOVALIDATE
> RELY, foreign key(a) references functional_seq_record_snap.parent_table_2(a) 
> DISABLE
> NOVALIDATE RELY)
> row format delimited fields terminated by ','
> LOCATION '/test-warehouse/child_table'
> Traceback (most recent call last):
>   File "Impala/bin/load-data.py", line 208, in exec_impala_query_from_file
> result = impala_client.execute(query)
>   File "Impala/tests/beeswax/impala_beeswax.py", line 187, in execute
> handle = self.__execute_query(query_string.strip(), user=user)
>   File "Impala/tests/beeswax/impala_beeswax.py", line 362, in __execute_query
> handle = self.execute_query_async(query_string, user=user)
>   File "Impala/tests/beeswax/impala_beeswax.py", line 356, in 
> execute_query_async
> handle = self.__do_rpc(lambda: self.imp_service.query(query,))
>   File "Impala/tests/beeswax/impala_beeswax.py", line 519, in __do_rpc
> raise ImpalaBeeswaxException(self.__build_error_message(b), b)
> ImpalaBeeswaxException: ImpalaBeeswaxException:
>  INNER EXCEPTION: 
>  MESSAGE: ImpalaRuntimeException: Error making 'createTable' RPC to Hive 
> Metastore:
> CAUSED BY: MetaException: Foreign key references id:int;year:string; but no 
> corresponding primary key or unique key exists. Possible keys: 
> [year:string;id:int;]{code}
> The corresponding error in HMS is:
> {code:java}
> 2019-11-22T06:36:59,937  INFO [pool-10-thread-13] metastore.HiveMetaStore: 
> 18: source:127.0.0.1 create_table_req: Table(tableName:child_table, 
> dbName:functional_seq_record_gzip, owner:jenkins, createTime:0, 
> lastAccessTime:0, retention:0, 
> sd:StorageDescriptor(cols:[FieldSchema(name:seq, type:int, comment:null), 
> FieldSchema(name:id, type:int, comment:null), FieldSchema(name:year, 
> type:string, comment:null), FieldSchema(name:a, type:int, comment:null)], 
> location:hdfs://localhost:20500/test-warehouse/child_table, 
> inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
> compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:null, 
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> parameters:{serialization.format=,, field.delim=,}), bucketCols:null, 
> sortCols:null, parameters:null), partitionKeys:[], parameters:{EXTERNAL=TRUE, 
> OBJCAPABILITIES=EXTREAD,EXTWRITE}, viewOriginalText:null, 
> viewExpandedText:null, tableType:EXTERNAL_TABLE, catName:hive, 
> ownerType:USER, accessType:8)
> 2019-11-22T06:36:59,937  INFO [pool-10-thread-13] HiveMetaStore.audit: 
> ugi=jenkins  ip=127.0.0.1cmd=source:127.0.0.1 create_table_req: 
> Table(tableName:child_table, dbName:functional_seq_record_gzip, 
> owner:jenkins, createTime:0, lastAccessTime:0, retention:0, 
> sd:StorageDescriptor(cols:[FieldSchema(name:seq, type:int, comment:null), 
> FieldSchema(name:id, type:int, comment:null), FieldSchema(name:year, 
> type:string, comment:null), FieldSchema(name:a, type:int, comment:null)], 
> location:hdfs://localhost:20500/test-warehouse/child_table, 
> inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
> compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:null, 
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> parameters:{serialization.format=,, field.delim=,}), bucketCols:null, 
> sortCols:null, parameters:null), partitionKeys:[], parameters:{EXTERNAL=TRUE, 
> OBJCAPABILITIES=EXTREAD,EXTWRITE}, 

[jira] [Commented] (IMPALA-7984) Port UpdateFilter() and PublishFilter() to KRPC

2019-11-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-7984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980611#comment-16980611
 ] 

ASF subversion and git services commented on IMPALA-7984:
-

Commit e716e76cccf59c2780571429b1b945d6bbc61b8d in impala's branch 
refs/heads/master from Fang-Yu Rao
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=e716e76 ]

IMPALA-9154: Revert "IMPALA-7984: Port runtime filter from Thrift RPC to KRPC"

The previous patch porting runtime filter from Thrift RPC to KRPC
introduces a deadlock if there are a very limited number of threads on
the Impala cluster.

Specifically, in that patch a Coordinator used a synchronous KRPC to
propagate an aggregated filter to other hosts. A deadlock would happen
if there is no thread available on the receiving side to answer that
KRPC especially the calling and receiving threads are called from the
same thread pool. One possible way to address this issue is to make
the call of propagating a runtime filter asynchronous to free the
calling thread. Before resolving this issue, we revert this patch for
now.

This reverts commit ec11c18884988e838a8838e1e8ecc37461e1a138.

Change-Id: I32371a515fb607da396914502da8c7fb071406bc
Reviewed-on: http://gerrit.cloudera.org:8080/14780
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Port UpdateFilter() and PublishFilter() to KRPC
> ---
>
> Key: IMPALA-7984
> URL: https://issues.apache.org/jira/browse/IMPALA-7984
> Project: IMPALA
>  Issue Type: Task
>  Components: Distributed Exec
>Affects Versions: Impala 3.1.0
>Reporter: Michael Ho
>Assignee: Fang-Yu Rao
>Priority: Major
> Fix For: Impala 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8709) Add Damerau-Levenshtein edit distance built-in function

2019-11-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980609#comment-16980609
 ] 

ASF subversion and git services commented on IMPALA-8709:
-

Commit a862282811e76767c6c5d7874db2a310586f2421 in impala's branch 
refs/heads/master from norbert.luksa
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=a862282 ]

IMPALA-8709: Add Damerau-Levenshtein edit distance built-in function

This patch adds new built-in functions to calculate restricted
Damerau-Levenshtein edit distance (optimal string alignment).
Implmented as dle_dst() and damerau_levenshtein(). If either value is
NULL or both values are NULL returns NULL which differs from Netezza's
dle_dst() which returns the length of the not NULL value or 0 if both
values are NULL. The NULL behavior matches the existing levenshtein()
function.

Also cleans up levenshtein tests.

Testing:
- Added unit tests to expr-test.cc
- Manual testing on over 1400 string pairs from
  http://marvin.cs.uidaho.edu/misspell.html and results match Netezza

Change-Id: Ib759817ec15e7075bf49d51e494e45c8af4db94d
Reviewed-on: http://gerrit.cloudera.org:8080/13794
Reviewed-by: Impala Public Jenkins 
Reviewed-by: Csaba Ringhofer 
Tested-by: Impala Public Jenkins 


> Add Damerau-Levenshtein edit distance built-in function
> ---
>
> Key: IMPALA-8709
> URL: https://issues.apache.org/jira/browse/IMPALA-8709
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Greg Rahn
>Assignee: Greg Rahn
>Priority: Major
>  Labels: built-in-function
>
> Algo (restricted DL / optimal string alignment)
>  [https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance]
> References:
>  
> [https://www.ibm.com/support/knowledgecenter/en/SSULQD_7.2.1/com.ibm.nz.dbu.doc/r_dbuser_functions_expressions_fuzzy_funcs.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9154) KRPC DataStreamService threads blocked in PublishFilter

2019-11-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980610#comment-16980610
 ] 

ASF subversion and git services commented on IMPALA-9154:
-

Commit e716e76cccf59c2780571429b1b945d6bbc61b8d in impala's branch 
refs/heads/master from Fang-Yu Rao
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=e716e76 ]

IMPALA-9154: Revert "IMPALA-7984: Port runtime filter from Thrift RPC to KRPC"

The previous patch porting runtime filter from Thrift RPC to KRPC
introduces a deadlock if there are a very limited number of threads on
the Impala cluster.

Specifically, in that patch a Coordinator used a synchronous KRPC to
propagate an aggregated filter to other hosts. A deadlock would happen
if there is no thread available on the receiving side to answer that
KRPC especially the calling and receiving threads are called from the
same thread pool. One possible way to address this issue is to make
the call of propagating a runtime filter asynchronous to free the
calling thread. Before resolving this issue, we revert this patch for
now.

This reverts commit ec11c18884988e838a8838e1e8ecc37461e1a138.

Change-Id: I32371a515fb607da396914502da8c7fb071406bc
Reviewed-on: http://gerrit.cloudera.org:8080/14780
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> KRPC DataStreamService threads blocked in PublishFilter
> ---
>
> Key: IMPALA-9154
> URL: https://issues.apache.org/jira/browse/IMPALA-9154
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 3.4.0
>Reporter: Tim Armstrong
>Assignee: Fang-Yu Rao
>Priority: Blocker
>  Labels: hang
> Attachments: image-2019-11-13-08-30-27-178.png, pstack-exchange.txt
>
>
> I hit this on primitive_many_fragments when doing a single node perf run:
> {noformat}
>  ./bin/single_node_perf_run.py --num_impalads=1 --scale=30 --ninja 
> --workloads=targeted-perf  --iterations=5
> {noformat}tan 
> I noticed that the query was hung and the execution threads were hung sending 
> row batches. Then looking at the RPCz page, all of the threads were busy:
>  !image-2019-11-13-08-30-27-178.png! 
> Multiple threads were stuck in UpdateFilter() - see  [^pstack-exchange.txt]. 
> It looks like this is a deadlock bug because a KRPC thread is blocked waiting 
> for an RPC that needs to be served by one of the limited threads from that 
> same thread pool



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9193) Enable precommit tests for USE_CDP_HIVE=true

2019-11-22 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980605#comment-16980605
 ] 

Sahil Takiar commented on IMPALA-9193:
--

+1 having this in place would have prevented IMPALA-9188

> Enable precommit tests for USE_CDP_HIVE=true
> 
>
> Key: IMPALA-9193
> URL: https://issues.apache.org/jira/browse/IMPALA-9193
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.4.0
>Reporter: Joe McDonnell
>Priority: Critical
>
> The USE_CDP_HIVE=true configuration is currently not tested by the 
> gerrit-verify-dryrun job beyond a build in all-build-options-ub1604. The 
> configuration has reached a level of stability that tests should regularly 
> pass. We regularly do active development on this configuration, so it is 
> important not to break it.
> We should add some more in-depth testing of USE_CDP_HIVE=true in our 
> precommit tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9188) Dataload is failing when USE_CDP_HIVE=true

2019-11-22 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980599#comment-16980599
 ] 

Sahil Takiar commented on IMPALA-9188:
--

I'll try digging into it a bit, but if I can't figure it out soon I'll probably 
have to revert it because a few builds are blocked on it.

> Dataload is failing when USE_CDP_HIVE=true
> --
>
> Key: IMPALA-9188
> URL: https://issues.apache.org/jira/browse/IMPALA-9188
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Anurag Mantripragada
>Priority: Critical
>
> When USE_CDP_HIVE=true, Impala builds are failing during dataload when 
> creating tables with PK/FK constraints.
> The error is:
> {code:java}
> ERROR: CREATE EXTERNAL TABLE IF NOT EXISTS 
> functional_seq_record_snap.child_table (
> seq int, id int, year string, a int, primary key(seq) DISABLE NOVALIDATE 
> RELY, foreign key
> (id, year) references functional_seq_record_snap.parent_table(id, year) 
> DISABLE NOVALIDATE
> RELY, foreign key(a) references functional_seq_record_snap.parent_table_2(a) 
> DISABLE
> NOVALIDATE RELY)
> row format delimited fields terminated by ','
> LOCATION '/test-warehouse/child_table'
> Traceback (most recent call last):
>   File "Impala/bin/load-data.py", line 208, in exec_impala_query_from_file
> result = impala_client.execute(query)
>   File "Impala/tests/beeswax/impala_beeswax.py", line 187, in execute
> handle = self.__execute_query(query_string.strip(), user=user)
>   File "Impala/tests/beeswax/impala_beeswax.py", line 362, in __execute_query
> handle = self.execute_query_async(query_string, user=user)
>   File "Impala/tests/beeswax/impala_beeswax.py", line 356, in 
> execute_query_async
> handle = self.__do_rpc(lambda: self.imp_service.query(query,))
>   File "Impala/tests/beeswax/impala_beeswax.py", line 519, in __do_rpc
> raise ImpalaBeeswaxException(self.__build_error_message(b), b)
> ImpalaBeeswaxException: ImpalaBeeswaxException:
>  INNER EXCEPTION: 
>  MESSAGE: ImpalaRuntimeException: Error making 'createTable' RPC to Hive 
> Metastore:
> CAUSED BY: MetaException: Foreign key references id:int;year:string; but no 
> corresponding primary key or unique key exists. Possible keys: 
> [year:string;id:int;]{code}
> The corresponding error in HMS is:
> {code:java}
> 2019-11-22T06:36:59,937  INFO [pool-10-thread-13] metastore.HiveMetaStore: 
> 18: source:127.0.0.1 create_table_req: Table(tableName:child_table, 
> dbName:functional_seq_record_gzip, owner:jenkins, createTime:0, 
> lastAccessTime:0, retention:0, 
> sd:StorageDescriptor(cols:[FieldSchema(name:seq, type:int, comment:null), 
> FieldSchema(name:id, type:int, comment:null), FieldSchema(name:year, 
> type:string, comment:null), FieldSchema(name:a, type:int, comment:null)], 
> location:hdfs://localhost:20500/test-warehouse/child_table, 
> inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
> compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:null, 
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> parameters:{serialization.format=,, field.delim=,}), bucketCols:null, 
> sortCols:null, parameters:null), partitionKeys:[], parameters:{EXTERNAL=TRUE, 
> OBJCAPABILITIES=EXTREAD,EXTWRITE}, viewOriginalText:null, 
> viewExpandedText:null, tableType:EXTERNAL_TABLE, catName:hive, 
> ownerType:USER, accessType:8)
> 2019-11-22T06:36:59,937  INFO [pool-10-thread-13] HiveMetaStore.audit: 
> ugi=jenkins  ip=127.0.0.1cmd=source:127.0.0.1 create_table_req: 
> Table(tableName:child_table, dbName:functional_seq_record_gzip, 
> owner:jenkins, createTime:0, lastAccessTime:0, retention:0, 
> sd:StorageDescriptor(cols:[FieldSchema(name:seq, type:int, comment:null), 
> FieldSchema(name:id, type:int, comment:null), FieldSchema(name:year, 
> type:string, comment:null), FieldSchema(name:a, type:int, comment:null)], 
> location:hdfs://localhost:20500/test-warehouse/child_table, 
> inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
> compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:null, 
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> parameters:{serialization.format=,, field.delim=,}), bucketCols:null, 
> sortCols:null, parameters:null), partitionKeys:[], parameters:{EXTERNAL=TRUE, 
> OBJCAPABILITIES=EXTREAD,EXTWRITE}, viewOriginalText:null, 
> viewExpandedText:null, tableType:EXTERNAL_TABLE, catName:hive, 
> ownerType:USER, accessType:8)
> 2019-11-22T06:36:59,937  INFO [pool-10-thread-13] 
> metastore.MetastoreDefaultTransformer: Starting translation for CreateTable 
> for processor Impala3.4.0-SNAPSHOT@localhost with [EXTWRITE, EXTREAD, 
> 

[jira] [Commented] (IMPALA-9188) Dataload is failing when USE_CDP_HIVE=true

2019-11-22 Thread Anurag Mantripragada (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980597#comment-16980597
 ] 

Anurag Mantripragada commented on IMPALA-9188:
--

>From the logs, it looks like there was a mismatch in the order of the PK 
>column names[expecting (year,id), but got (id, year)]:
{code:java}
 2019-11-22T06:36:59,945 ERROR [pool-10-thread-13] 
metastore.RetryingHMSHandler: MetaException(message:Foreign key references 
id:int;year:string; but no corresponding primary key or unique key exists. 
Possible keys: [year:string;id:int;]){code}
I looked at the differences in what Hive is doing in Hive 2 and Hive 3. Looks 
like Hive 3 introduced a canonical way to get PK column names (using a 
combination of PKName and Key_Seq). I do not have access to my dev machine to 
check why HMS would get the different order of PK columns from what it is 
expecting. Below is the code that does that.

[https://github.infra.cloudera.com/CDH/hive/blob/95c21a5a1671b27d55b30d8abe5ad185616a4457/standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L4988]

I can take a look at it, but I do not know how soon. If no one has cycles to 
check this, I think it is best to revert the change.

> Dataload is failing when USE_CDP_HIVE=true
> --
>
> Key: IMPALA-9188
> URL: https://issues.apache.org/jira/browse/IMPALA-9188
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Anurag Mantripragada
>Priority: Critical
>
> When USE_CDP_HIVE=true, Impala builds are failing during dataload when 
> creating tables with PK/FK constraints.
> The error is:
> {code:java}
> ERROR: CREATE EXTERNAL TABLE IF NOT EXISTS 
> functional_seq_record_snap.child_table (
> seq int, id int, year string, a int, primary key(seq) DISABLE NOVALIDATE 
> RELY, foreign key
> (id, year) references functional_seq_record_snap.parent_table(id, year) 
> DISABLE NOVALIDATE
> RELY, foreign key(a) references functional_seq_record_snap.parent_table_2(a) 
> DISABLE
> NOVALIDATE RELY)
> row format delimited fields terminated by ','
> LOCATION '/test-warehouse/child_table'
> Traceback (most recent call last):
>   File "Impala/bin/load-data.py", line 208, in exec_impala_query_from_file
> result = impala_client.execute(query)
>   File "Impala/tests/beeswax/impala_beeswax.py", line 187, in execute
> handle = self.__execute_query(query_string.strip(), user=user)
>   File "Impala/tests/beeswax/impala_beeswax.py", line 362, in __execute_query
> handle = self.execute_query_async(query_string, user=user)
>   File "Impala/tests/beeswax/impala_beeswax.py", line 356, in 
> execute_query_async
> handle = self.__do_rpc(lambda: self.imp_service.query(query,))
>   File "Impala/tests/beeswax/impala_beeswax.py", line 519, in __do_rpc
> raise ImpalaBeeswaxException(self.__build_error_message(b), b)
> ImpalaBeeswaxException: ImpalaBeeswaxException:
>  INNER EXCEPTION: 
>  MESSAGE: ImpalaRuntimeException: Error making 'createTable' RPC to Hive 
> Metastore:
> CAUSED BY: MetaException: Foreign key references id:int;year:string; but no 
> corresponding primary key or unique key exists. Possible keys: 
> [year:string;id:int;]{code}
> The corresponding error in HMS is:
> {code:java}
> 2019-11-22T06:36:59,937  INFO [pool-10-thread-13] metastore.HiveMetaStore: 
> 18: source:127.0.0.1 create_table_req: Table(tableName:child_table, 
> dbName:functional_seq_record_gzip, owner:jenkins, createTime:0, 
> lastAccessTime:0, retention:0, 
> sd:StorageDescriptor(cols:[FieldSchema(name:seq, type:int, comment:null), 
> FieldSchema(name:id, type:int, comment:null), FieldSchema(name:year, 
> type:string, comment:null), FieldSchema(name:a, type:int, comment:null)], 
> location:hdfs://localhost:20500/test-warehouse/child_table, 
> inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
> compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:null, 
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> parameters:{serialization.format=,, field.delim=,}), bucketCols:null, 
> sortCols:null, parameters:null), partitionKeys:[], parameters:{EXTERNAL=TRUE, 
> OBJCAPABILITIES=EXTREAD,EXTWRITE}, viewOriginalText:null, 
> viewExpandedText:null, tableType:EXTERNAL_TABLE, catName:hive, 
> ownerType:USER, accessType:8)
> 2019-11-22T06:36:59,937  INFO [pool-10-thread-13] HiveMetaStore.audit: 
> ugi=jenkins  ip=127.0.0.1cmd=source:127.0.0.1 create_table_req: 
> Table(tableName:child_table, dbName:functional_seq_record_gzip, 
> owner:jenkins, createTime:0, lastAccessTime:0, retention:0, 
> sd:StorageDescriptor(cols:[FieldSchema(name:seq, type:int, comment:null), 
> FieldSchema(name:id, type:int, comment:null), FieldSchema(name:year, 
> 

[jira] [Created] (IMPALA-9194) Add support for Debian 9

2019-11-22 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-9194:
-

 Summary: Add support for Debian 9
 Key: IMPALA-9194
 URL: https://issues.apache.org/jira/browse/IMPALA-9194
 Project: IMPALA
  Issue Type: Task
  Components: Infrastructure
Affects Versions: Impala 3.4.0
Reporter: Joe McDonnell


Debian 9 has been available for a couple years and seems like a useful addition 
to our Debian 8 support. 

 

This will require a corresponding change in 
[https://github.com/cloudera/native-toolchain] to support Debian 9.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-9194) Add support for Debian 9

2019-11-22 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-9194:
-

 Summary: Add support for Debian 9
 Key: IMPALA-9194
 URL: https://issues.apache.org/jira/browse/IMPALA-9194
 Project: IMPALA
  Issue Type: Task
  Components: Infrastructure
Affects Versions: Impala 3.4.0
Reporter: Joe McDonnell


Debian 9 has been available for a couple years and seems like a useful addition 
to our Debian 8 support. 

 

This will require a corresponding change in 
[https://github.com/cloudera/native-toolchain] to support Debian 9.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-9193) Enable precommit tests for USE_CDP_HIVE=true

2019-11-22 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-9193:
-

 Summary: Enable precommit tests for USE_CDP_HIVE=true
 Key: IMPALA-9193
 URL: https://issues.apache.org/jira/browse/IMPALA-9193
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 3.4.0
Reporter: Joe McDonnell


The USE_CDP_HIVE=true configuration is currently not tested by the 
gerrit-verify-dryrun job beyond a build in all-build-options-ub1604. The 
configuration has reached a level of stability that tests should regularly 
pass. We regularly do active development on this configuration, so it is 
important not to break it.

We should add some more in-depth testing of USE_CDP_HIVE=true in our precommit 
tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-9193) Enable precommit tests for USE_CDP_HIVE=true

2019-11-22 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-9193:
-

 Summary: Enable precommit tests for USE_CDP_HIVE=true
 Key: IMPALA-9193
 URL: https://issues.apache.org/jira/browse/IMPALA-9193
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 3.4.0
Reporter: Joe McDonnell


The USE_CDP_HIVE=true configuration is currently not tested by the 
gerrit-verify-dryrun job beyond a build in all-build-options-ub1604. The 
configuration has reached a level of stability that tests should regularly 
pass. We regularly do active development on this configuration, so it is 
important not to break it.

We should add some more in-depth testing of USE_CDP_HIVE=true in our precommit 
tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (IMPALA-9183) TPC-DS query 13 - customer_address predicates not propagated to scan

2019-11-22 Thread David Rorke (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Rorke reassigned IMPALA-9183:
---

Assignee: Aman Sinha

> TPC-DS query 13 - customer_address predicates not propagated to scan
> 
>
> Key: IMPALA-9183
> URL: https://issues.apache.org/jira/browse/IMPALA-9183
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.4.0
>Reporter: David Rorke
>Assignee: Aman Sinha
>Priority: Major
>  Labels: tpc-ds
> Attachments: profile_q13.txt, profile_q13_mod.txt, q13_mod_plan.png, 
> q13_plan.png, query13.sql, query13_mod.sql
>
>
> TPC-DS query 13 has a set of predicates on the customer_address table, 
> ca_state column that are currently evaluated after the join of 
> customer_address and store_sales.   The ca_state predicates could be pushed 
> down to the customer_address scan node.  This would reduce the size of the 
> join input by a factor of 3.4.
> As an experiment I added an additional redundant predicate to the query (see 
> attached query13_mod.sql) which causes the planner to evaluate the predicate 
> at the scan node. 
> Performance of the original and modified queries at 10 TB scale factor:
> Original:  164 seconds
> Modified: 44 seconds
> Query profiles for both versions attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9188) Dataload is failing when USE_CDP_HIVE=true

2019-11-22 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980583#comment-16980583
 ] 

Sahil Takiar commented on IMPALA-9188:
--

Looked through the HMS logs. According to the logs, the parents tables are 
created.

> Dataload is failing when USE_CDP_HIVE=true
> --
>
> Key: IMPALA-9188
> URL: https://issues.apache.org/jira/browse/IMPALA-9188
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Anurag Mantripragada
>Priority: Critical
>
> When USE_CDP_HIVE=true, Impala builds are failing during dataload when 
> creating tables with PK/FK constraints.
> The error is:
> {code:java}
> ERROR: CREATE EXTERNAL TABLE IF NOT EXISTS 
> functional_seq_record_snap.child_table (
> seq int, id int, year string, a int, primary key(seq) DISABLE NOVALIDATE 
> RELY, foreign key
> (id, year) references functional_seq_record_snap.parent_table(id, year) 
> DISABLE NOVALIDATE
> RELY, foreign key(a) references functional_seq_record_snap.parent_table_2(a) 
> DISABLE
> NOVALIDATE RELY)
> row format delimited fields terminated by ','
> LOCATION '/test-warehouse/child_table'
> Traceback (most recent call last):
>   File "Impala/bin/load-data.py", line 208, in exec_impala_query_from_file
> result = impala_client.execute(query)
>   File "Impala/tests/beeswax/impala_beeswax.py", line 187, in execute
> handle = self.__execute_query(query_string.strip(), user=user)
>   File "Impala/tests/beeswax/impala_beeswax.py", line 362, in __execute_query
> handle = self.execute_query_async(query_string, user=user)
>   File "Impala/tests/beeswax/impala_beeswax.py", line 356, in 
> execute_query_async
> handle = self.__do_rpc(lambda: self.imp_service.query(query,))
>   File "Impala/tests/beeswax/impala_beeswax.py", line 519, in __do_rpc
> raise ImpalaBeeswaxException(self.__build_error_message(b), b)
> ImpalaBeeswaxException: ImpalaBeeswaxException:
>  INNER EXCEPTION: 
>  MESSAGE: ImpalaRuntimeException: Error making 'createTable' RPC to Hive 
> Metastore:
> CAUSED BY: MetaException: Foreign key references id:int;year:string; but no 
> corresponding primary key or unique key exists. Possible keys: 
> [year:string;id:int;]{code}
> The corresponding error in HMS is:
> {code:java}
> 2019-11-22T06:36:59,937  INFO [pool-10-thread-13] metastore.HiveMetaStore: 
> 18: source:127.0.0.1 create_table_req: Table(tableName:child_table, 
> dbName:functional_seq_record_gzip, owner:jenkins, createTime:0, 
> lastAccessTime:0, retention:0, 
> sd:StorageDescriptor(cols:[FieldSchema(name:seq, type:int, comment:null), 
> FieldSchema(name:id, type:int, comment:null), FieldSchema(name:year, 
> type:string, comment:null), FieldSchema(name:a, type:int, comment:null)], 
> location:hdfs://localhost:20500/test-warehouse/child_table, 
> inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
> compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:null, 
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> parameters:{serialization.format=,, field.delim=,}), bucketCols:null, 
> sortCols:null, parameters:null), partitionKeys:[], parameters:{EXTERNAL=TRUE, 
> OBJCAPABILITIES=EXTREAD,EXTWRITE}, viewOriginalText:null, 
> viewExpandedText:null, tableType:EXTERNAL_TABLE, catName:hive, 
> ownerType:USER, accessType:8)
> 2019-11-22T06:36:59,937  INFO [pool-10-thread-13] HiveMetaStore.audit: 
> ugi=jenkins  ip=127.0.0.1cmd=source:127.0.0.1 create_table_req: 
> Table(tableName:child_table, dbName:functional_seq_record_gzip, 
> owner:jenkins, createTime:0, lastAccessTime:0, retention:0, 
> sd:StorageDescriptor(cols:[FieldSchema(name:seq, type:int, comment:null), 
> FieldSchema(name:id, type:int, comment:null), FieldSchema(name:year, 
> type:string, comment:null), FieldSchema(name:a, type:int, comment:null)], 
> location:hdfs://localhost:20500/test-warehouse/child_table, 
> inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
> compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:null, 
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> parameters:{serialization.format=,, field.delim=,}), bucketCols:null, 
> sortCols:null, parameters:null), partitionKeys:[], parameters:{EXTERNAL=TRUE, 
> OBJCAPABILITIES=EXTREAD,EXTWRITE}, viewOriginalText:null, 
> viewExpandedText:null, tableType:EXTERNAL_TABLE, catName:hive, 
> ownerType:USER, accessType:8)
> 2019-11-22T06:36:59,937  INFO [pool-10-thread-13] 
> metastore.MetastoreDefaultTransformer: Starting translation for CreateTable 
> for processor Impala3.4.0-SNAPSHOT@localhost with [EXTWRITE, EXTREAD, 
> HIVEMANAGEDINSERTREAD, HIVEMANAGEDINSERTWRITE, HIVESQL, HIVEMQT, 

[jira] [Updated] (IMPALA-9188) Dataload is failing when USE_CDP_HIVE=true

2019-11-22 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated IMPALA-9188:
-
Description: 
When USE_CDP_HIVE=true, Impala builds are failing during dataload when creating 
tables with PK/FK constraints.

The error is:
{code:java}
ERROR: CREATE EXTERNAL TABLE IF NOT EXISTS 
functional_seq_record_snap.child_table (
seq int, id int, year string, a int, primary key(seq) DISABLE NOVALIDATE RELY, 
foreign key
(id, year) references functional_seq_record_snap.parent_table(id, year) DISABLE 
NOVALIDATE
RELY, foreign key(a) references functional_seq_record_snap.parent_table_2(a) 
DISABLE
NOVALIDATE RELY)
row format delimited fields terminated by ','
LOCATION '/test-warehouse/child_table'
Traceback (most recent call last):
  File "Impala/bin/load-data.py", line 208, in exec_impala_query_from_file
result = impala_client.execute(query)
  File "Impala/tests/beeswax/impala_beeswax.py", line 187, in execute
handle = self.__execute_query(query_string.strip(), user=user)
  File "Impala/tests/beeswax/impala_beeswax.py", line 362, in __execute_query
handle = self.execute_query_async(query_string, user=user)
  File "Impala/tests/beeswax/impala_beeswax.py", line 356, in 
execute_query_async
handle = self.__do_rpc(lambda: self.imp_service.query(query,))
  File "Impala/tests/beeswax/impala_beeswax.py", line 519, in __do_rpc
raise ImpalaBeeswaxException(self.__build_error_message(b), b)
ImpalaBeeswaxException: ImpalaBeeswaxException:
 INNER EXCEPTION: 
 MESSAGE: ImpalaRuntimeException: Error making 'createTable' RPC to Hive 
Metastore:
CAUSED BY: MetaException: Foreign key references id:int;year:string; but no 
corresponding primary key or unique key exists. Possible keys: 
[year:string;id:int;]{code}

The corresponding error in HMS is:
{code:java}
2019-11-22T06:36:59,937  INFO [pool-10-thread-13] metastore.HiveMetaStore: 18: 
source:127.0.0.1 create_table_req: Table(tableName:child_table, 
dbName:functional_seq_record_gzip, owner:jenkins, createTime:0, 
lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:seq, 
type:int, comment:null), FieldSchema(name:id, type:int, comment:null), 
FieldSchema(name:year, type:string, comment:null), FieldSchema(name:a, 
type:int, comment:null)], 
location:hdfs://localhost:20500/test-warehouse/child_table, 
inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
parameters:{serialization.format=,, field.delim=,}), bucketCols:null, 
sortCols:null, parameters:null), partitionKeys:[], parameters:{EXTERNAL=TRUE, 
OBJCAPABILITIES=EXTREAD,EXTWRITE}, viewOriginalText:null, 
viewExpandedText:null, tableType:EXTERNAL_TABLE, catName:hive, ownerType:USER, 
accessType:8)
2019-11-22T06:36:59,937  INFO [pool-10-thread-13] HiveMetaStore.audit: 
ugi=jenkins  ip=127.0.0.1cmd=source:127.0.0.1 create_table_req: 
Table(tableName:child_table, dbName:functional_seq_record_gzip, owner:jenkins, 
createTime:0, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:seq, type:int, comment:null), 
FieldSchema(name:id, type:int, comment:null), FieldSchema(name:year, 
type:string, comment:null), FieldSchema(name:a, type:int, comment:null)], 
location:hdfs://localhost:20500/test-warehouse/child_table, 
inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
parameters:{serialization.format=,, field.delim=,}), bucketCols:null, 
sortCols:null, parameters:null), partitionKeys:[], parameters:{EXTERNAL=TRUE, 
OBJCAPABILITIES=EXTREAD,EXTWRITE}, viewOriginalText:null, 
viewExpandedText:null, tableType:EXTERNAL_TABLE, catName:hive, ownerType:USER, 
accessType:8)
2019-11-22T06:36:59,937  INFO [pool-10-thread-13] 
metastore.MetastoreDefaultTransformer: Starting translation for CreateTable for 
processor Impala3.4.0-SNAPSHOT@localhost with [EXTWRITE, EXTREAD, 
HIVEMANAGEDINSERTREAD, HIVEMANAGEDINSERTWRITE, HIVESQL, HIVEMQT, HIVEBUCKET2] 
on table child_table
2019-11-22T06:36:59,937  INFO [pool-10-thread-13] 
metastore.MetastoreDefaultTransformer: Table to be created is of type 
EXTERNAL_TABLE but not MANAGED_TABLE
2019-11-22T06:36:59,937  INFO [pool-10-thread-13] 
metastore.MetastoreDefaultTransformer: Transformer returning 
table:Table(tableName:child_table, dbName:functional_seq_record_gzip, 
owner:jenkins, createTime:0, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:seq, type:int, comment:null), 
FieldSchema(name:id, type:int, comment:null), FieldSchema(name:year, 
type:string, comment:null), 

[jira] [Created] (IMPALA-9192) When build with USE_CDP_HIVE=true, Impala should use CDP Avro, Parquet, etc

2019-11-22 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-9192:
-

 Summary: When build with USE_CDP_HIVE=true, Impala should use CDP 
Avro, Parquet, etc
 Key: IMPALA-9192
 URL: https://issues.apache.org/jira/browse/IMPALA-9192
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 3.4.0
Reporter: Joe McDonnell


The USE_CDP_HIVE=true configuration should get dependencies from the 
CDP_BUILD_NUMBER wherever possible. I found that there are still a few things 
left that come from the CDH_BUILD_NUMBER. Specifically, Parquet, Avro, and Kite 
are still using CDH versions:
{noformat}
export IMPALA_PARQUET_VERSION=1.10.99-cdh6.x-SNAPSHOT
export IMPALA_AVRO_JAVA_VERSION=1.8.2-cdh6.x-SNAPSHOT
...
export IMPALA_KITE_VERSION=1.0.0-cdh6.x-SNAPSHOT{noformat}
This is important to be compatible with other CDP components like Hive. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-9192) When build with USE_CDP_HIVE=true, Impala should use CDP Avro, Parquet, etc

2019-11-22 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-9192:
-

 Summary: When build with USE_CDP_HIVE=true, Impala should use CDP 
Avro, Parquet, etc
 Key: IMPALA-9192
 URL: https://issues.apache.org/jira/browse/IMPALA-9192
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 3.4.0
Reporter: Joe McDonnell


The USE_CDP_HIVE=true configuration should get dependencies from the 
CDP_BUILD_NUMBER wherever possible. I found that there are still a few things 
left that come from the CDH_BUILD_NUMBER. Specifically, Parquet, Avro, and Kite 
are still using CDH versions:
{noformat}
export IMPALA_PARQUET_VERSION=1.10.99-cdh6.x-SNAPSHOT
export IMPALA_AVRO_JAVA_VERSION=1.8.2-cdh6.x-SNAPSHOT
...
export IMPALA_KITE_VERSION=1.0.0-cdh6.x-SNAPSHOT{noformat}
This is important to be compatible with other CDP components like Hive. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-9191) Provide a way to build Impala with only one of Sentry / Ranger

2019-11-22 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-9191:
-

 Summary: Provide a way to build Impala with only one of Sentry / 
Ranger
 Key: IMPALA-9191
 URL: https://issues.apache.org/jira/browse/IMPALA-9191
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Affects Versions: Impala 3.4.0
Reporter: Joe McDonnell


Deployments of Impala will use either Ranger or Sentry, and deployments would 
not switch back and forth between the two. It makes sense to provide a way to 
pick at compile time which one to include. This allows packagers of Impala to 
avoid a dependency for whichever authorization provider they don't need.

In particular, compilation of the USE_CDP_HIVE=true side of Impala currently 
needs only a few things from the CDH_BUILD_NUMBER and one them is Sentry. In 
the other direction, the only thing a USE_CDP_HIVE=false configuration uses 
from the CDP_BUILD_NUMBER is Ranger.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-9191) Provide a way to build Impala with only one of Sentry / Ranger

2019-11-22 Thread Joe McDonnell (Jira)
Joe McDonnell created IMPALA-9191:
-

 Summary: Provide a way to build Impala with only one of Sentry / 
Ranger
 Key: IMPALA-9191
 URL: https://issues.apache.org/jira/browse/IMPALA-9191
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Affects Versions: Impala 3.4.0
Reporter: Joe McDonnell


Deployments of Impala will use either Ranger or Sentry, and deployments would 
not switch back and forth between the two. It makes sense to provide a way to 
pick at compile time which one to include. This allows packagers of Impala to 
avoid a dependency for whichever authorization provider they don't need.

In particular, compilation of the USE_CDP_HIVE=true side of Impala currently 
needs only a few things from the CDH_BUILD_NUMBER and one them is Sentry. In 
the other direction, the only thing a USE_CDP_HIVE=false configuration uses 
from the CDP_BUILD_NUMBER is Ranger.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IMPALA-9188) Dataload is failing when USE_CDP_HIVE=true

2019-11-22 Thread Vihang Karajgaonkar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980571#comment-16980571
 ] 

Vihang Karajgaonkar commented on IMPALA-9188:
-

hmm .. Do you know if the tables are created sequentially?  According to what I 
see in {{functional_schema_template.sql}}

{noformat}

  
 DATASET
  
functional  
  
 BASE_TABLE_NAME
  
parent_table
  
 CREATE 
  
CREATE EXTERNAL TABLE IF NOT EXISTS {db_name}{db_suffix}.{table_name} ( 
  
id INT, year string, primary key(id, year) DISABLE NOVALIDATE RELY) 
  
row format delimited fields terminated by ','   
  
LOCATION '/test-warehouse/{table_name}';
  
 ROW_FORMAT 
  
delimited fields terminated by '',''
  
 LOAD   
  
`hadoop fs -mkdir -p /test-warehouse/parent_table && hadoop fs -put -f \
  
${IMPALA_HOME}/testdata/data/parent_table.txt /test-warehouse/parent_table/ 
  

  
 DATASET
  
functional  
  
 BASE_TABLE_NAME
  
parent_table_2  
  
 CREATE 
  
CREATE EXTERNAL TABLE IF NOT EXISTS {db_name}{db_suffix}.{table_name} ( 
  
a INT, primary key(a) DISABLE NOVALIDATE RELY)  
  
row format delimited fields terminated by ','   
  
LOCATION '/test-warehouse/{table_name}';
  
 ROW_FORMAT 
  
delimited fields terminated by ','  
  
 LOAD   
  
`hadoop fs -mkdir -p /test-warehouse/parent_table_2 && hadoop fs -put -f \  
  
${IMPALA_HOME}/testdata/data/parent_table_2.txt /test-warehouse/parent_table_2/ 
  

  
 DATASET
  
functional  
  
 BASE_TABLE_NAME
  
child_table 
  
 CREATE 
  
CREATE EXTERNAL TABLE IF NOT EXISTS {db_name}{db_suffix}.{table_name} ( 
  
seq int, id int, year string, a int, primary key(seq) DISABLE NOVALIDATE RELY, 
foreign key
(id, year) references {db_name}{db_suffix}.parent_table(id, year) DISABLE 
NOVALIDATE  
RELY, foreign key(a) references {db_name}{db_suffix}.parent_table_2(a) DISABLE  
  
NOVALIDATE RELY)
  
row format delimited fields terminated by ','   
  
LOCATION '/test-warehouse/{table_name}';
  
 ROW_FORMAT 
  
delimited fields terminated by ','  
  
 LOAD   
  
`hadoop fs -mkdir -p /test-warehouse/child_table && hadoop fs -put -f \ 
  
${IMPALA_HOME}/testdata/data/child_table.txt /test-warehouse/child_table/   
  
   
{noformat}

The parent tables should be created first. Can we check the HMS logs to confirm 
if the parent tables were indeed created? If yes, then there is something more 
going on and 

[jira] [Commented] (IMPALA-9190) CatalogdMetaProviderTest.testPiggybackFailure is flaky

2019-11-22 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980484#comment-16980484
 ] 

Sahil Takiar commented on IMPALA-9190:
--

Only seen once in an exhaustive build with the data cache enabled.

> CatalogdMetaProviderTest.testPiggybackFailure is flaky
> --
>
> Key: IMPALA-9190
> URL: https://issues.apache.org/jira/browse/IMPALA-9190
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Priority: Major
>  Labels: broken-build, flaky
>
> The following test is flaky:
> org.apache.impala.catalog.local.CatalogdMetaProviderTest.testPiggybackFailure
> Error Message
> {code}
> Did not see enough piggybacked loads!
> Stacktrace
> java.lang.AssertionError: Did not see enough piggybacked loads!
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.impala.catalog.local.CatalogdMetaProviderTest.doTestPiggyback(CatalogdMetaProviderTest.java:314)
>   at 
> org.apache.impala.catalog.local.CatalogdMetaProviderTest.testPiggybackFailure(CatalogdMetaProviderTest.java:273)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:272)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:236)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:386)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:323)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:143)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-9190) CatalogdMetaProviderTest.testPiggybackFailure is flaky

2019-11-22 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-9190:


 Summary: CatalogdMetaProviderTest.testPiggybackFailure is flaky
 Key: IMPALA-9190
 URL: https://issues.apache.org/jira/browse/IMPALA-9190
 Project: IMPALA
  Issue Type: Bug
Reporter: Sahil Takiar


The following test is flaky:

org.apache.impala.catalog.local.CatalogdMetaProviderTest.testPiggybackFailure

Error Message
{code}
Did not see enough piggybacked loads!
Stacktrace
java.lang.AssertionError: Did not see enough piggybacked loads!
at org.junit.Assert.fail(Assert.java:88)
at 
org.apache.impala.catalog.local.CatalogdMetaProviderTest.doTestPiggyback(CatalogdMetaProviderTest.java:314)
at 
org.apache.impala.catalog.local.CatalogdMetaProviderTest.testPiggybackFailure(CatalogdMetaProviderTest.java:273)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:272)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:236)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:386)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:323)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:143)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-9190) CatalogdMetaProviderTest.testPiggybackFailure is flaky

2019-11-22 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-9190:


 Summary: CatalogdMetaProviderTest.testPiggybackFailure is flaky
 Key: IMPALA-9190
 URL: https://issues.apache.org/jira/browse/IMPALA-9190
 Project: IMPALA
  Issue Type: Bug
Reporter: Sahil Takiar


The following test is flaky:

org.apache.impala.catalog.local.CatalogdMetaProviderTest.testPiggybackFailure

Error Message
{code}
Did not see enough piggybacked loads!
Stacktrace
java.lang.AssertionError: Did not see enough piggybacked loads!
at org.junit.Assert.fail(Assert.java:88)
at 
org.apache.impala.catalog.local.CatalogdMetaProviderTest.doTestPiggyback(CatalogdMetaProviderTest.java:314)
at 
org.apache.impala.catalog.local.CatalogdMetaProviderTest.testPiggybackFailure(CatalogdMetaProviderTest.java:273)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:272)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:236)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:386)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:323)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:143)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-9189) Add command to purge data cache

2019-11-22 Thread Michael Ho (Jira)
Michael Ho created IMPALA-9189:
--

 Summary: Add command to purge data cache
 Key: IMPALA-9189
 URL: https://issues.apache.org/jira/browse/IMPALA-9189
 Project: IMPALA
  Issue Type: New Feature
  Components: Backend
Affects Versions: Impala 3.3.0
Reporter: Michael Ho


It would be great to have a command to purge the data cache on demand so all 
Impala compute nodes' data cache will be empty. This is useful for say 
experimentation or demo purposes. cc'ing [~drorke], [~joemcdonnell]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IMPALA-9188) Dataload is failing when USE_CDP_HIVE=true

2019-11-22 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980472#comment-16980472
 ] 

Sahil Takiar commented on IMPALA-9188:
--

[~vihangk1] any ideas on this? Otherwise, I think we might have to revert 
IMPALA-9104.

> Dataload is failing when USE_CDP_HIVE=true
> --
>
> Key: IMPALA-9188
> URL: https://issues.apache.org/jira/browse/IMPALA-9188
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Anurag Mantripragada
>Priority: Critical
>
> When USE_CDP_HIVE=true, Impala builds are failing during dataload when 
> creating tables with PK/FK constraints.
> The error is:
> {code:java}
> ERROR: CREATE EXTERNAL TABLE IF NOT EXISTS 
> functional_seq_record_snap.child_table (
> seq int, id int, year string, a int, primary key(seq) DISABLE NOVALIDATE 
> RELY, foreign key
> (id, year) references functional_seq_record_snap.parent_table(id, year) 
> DISABLE NOVALIDATE
> RELY, foreign key(a) references functional_seq_record_snap.parent_table_2(a) 
> DISABLE
> NOVALIDATE RELY)
> row format delimited fields terminated by ','
> LOCATION '/test-warehouse/child_table'
> Traceback (most recent call last):
>   File "Impala/bin/load-data.py", line 208, in exec_impala_query_from_file
> result = impala_client.execute(query)
>   File "Impala/tests/beeswax/impala_beeswax.py", line 187, in execute
> handle = self.__execute_query(query_string.strip(), user=user)
>   File "Impala/tests/beeswax/impala_beeswax.py", line 362, in __execute_query
> handle = self.execute_query_async(query_string, user=user)
>   File "Impala/tests/beeswax/impala_beeswax.py", line 356, in 
> execute_query_async
> handle = self.__do_rpc(lambda: self.imp_service.query(query,))
>   File "Impala/tests/beeswax/impala_beeswax.py", line 519, in __do_rpc
> raise ImpalaBeeswaxException(self.__build_error_message(b), b)
> ImpalaBeeswaxException: ImpalaBeeswaxException:
>  INNER EXCEPTION:  {code}
> The corresponding error in HMS is:
> {code:java}
> 2019-11-22T06:36:59,937  INFO [pool-10-thread-13] metastore.HiveMetaStore: 
> 18: source:127.0.0.1 create_table_req: Table(tableName:child_table, 
> dbName:functional_seq_record_gzip, owner:jenkins, createTime:0, 
> lastAccessTime:0, retention:0, 
> sd:StorageDescriptor(cols:[FieldSchema(name:seq, type:int, comment:null), 
> FieldSchema(name:id, type:int, comment:null), FieldSchema(name:year, 
> type:string, comment:null), FieldSchema(name:a, type:int, comment:null)], 
> location:hdfs://localhost:20500/test-warehouse/child_table, 
> inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
> compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:null, 
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> parameters:{serialization.format=,, field.delim=,}), bucketCols:null, 
> sortCols:null, parameters:null), partitionKeys:[], parameters:{EXTERNAL=TRUE, 
> OBJCAPABILITIES=EXTREAD,EXTWRITE}, viewOriginalText:null, 
> viewExpandedText:null, tableType:EXTERNAL_TABLE, catName:hive, 
> ownerType:USER, accessType:8)
> 2019-11-22T06:36:59,937  INFO [pool-10-thread-13] HiveMetaStore.audit: 
> ugi=jenkins  ip=127.0.0.1cmd=source:127.0.0.1 create_table_req: 
> Table(tableName:child_table, dbName:functional_seq_record_gzip, 
> owner:jenkins, createTime:0, lastAccessTime:0, retention:0, 
> sd:StorageDescriptor(cols:[FieldSchema(name:seq, type:int, comment:null), 
> FieldSchema(name:id, type:int, comment:null), FieldSchema(name:year, 
> type:string, comment:null), FieldSchema(name:a, type:int, comment:null)], 
> location:hdfs://localhost:20500/test-warehouse/child_table, 
> inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
> compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:null, 
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> parameters:{serialization.format=,, field.delim=,}), bucketCols:null, 
> sortCols:null, parameters:null), partitionKeys:[], parameters:{EXTERNAL=TRUE, 
> OBJCAPABILITIES=EXTREAD,EXTWRITE}, viewOriginalText:null, 
> viewExpandedText:null, tableType:EXTERNAL_TABLE, catName:hive, 
> ownerType:USER, accessType:8)
> 2019-11-22T06:36:59,937  INFO [pool-10-thread-13] 
> metastore.MetastoreDefaultTransformer: Starting translation for CreateTable 
> for processor Impala3.4.0-SNAPSHOT@localhost with [EXTWRITE, EXTREAD, 
> HIVEMANAGEDINSERTREAD, HIVEMANAGEDINSERTWRITE, HIVESQL, HIVEMQT, HIVEBUCKET2] 
> on table child_table
> 2019-11-22T06:36:59,937  INFO [pool-10-thread-13] 
> metastore.MetastoreDefaultTransformer: Table to be created is of type 
> EXTERNAL_TABLE but not MANAGED_TABLE
> 2019-11-22T06:36:59,937  INFO [pool-10-thread-13] 
> 

[jira] [Commented] (IMPALA-9188) Dataload is failing when USE_CDP_HIVE=true

2019-11-22 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980459#comment-16980459
 ] 

Sahil Takiar commented on IMPALA-9188:
--

IMPALA-9104 was recently merged, and I haven't found a successful build with 
this change + USE_CDP_HIVE=true. So it doesn't seem like a recent change to 
Hive / HMS caused this.

> Dataload is failing when USE_CDP_HIVE=true
> --
>
> Key: IMPALA-9188
> URL: https://issues.apache.org/jira/browse/IMPALA-9188
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Anurag Mantripragada
>Priority: Major
>
> When USE_CDP_HIVE=true, Impala builds are failing during dataload when 
> creating tables with PK/FK constraints.
> The error is:
> {code:java}
> ERROR: CREATE EXTERNAL TABLE IF NOT EXISTS 
> functional_seq_record_snap.child_table (
> seq int, id int, year string, a int, primary key(seq) DISABLE NOVALIDATE 
> RELY, foreign key
> (id, year) references functional_seq_record_snap.parent_table(id, year) 
> DISABLE NOVALIDATE
> RELY, foreign key(a) references functional_seq_record_snap.parent_table_2(a) 
> DISABLE
> NOVALIDATE RELY)
> row format delimited fields terminated by ','
> LOCATION '/test-warehouse/child_table'
> Traceback (most recent call last):
>   File "Impala/bin/load-data.py", line 208, in exec_impala_query_from_file
> result = impala_client.execute(query)
>   File "Impala/tests/beeswax/impala_beeswax.py", line 187, in execute
> handle = self.__execute_query(query_string.strip(), user=user)
>   File "Impala/tests/beeswax/impala_beeswax.py", line 362, in __execute_query
> handle = self.execute_query_async(query_string, user=user)
>   File "Impala/tests/beeswax/impala_beeswax.py", line 356, in 
> execute_query_async
> handle = self.__do_rpc(lambda: self.imp_service.query(query,))
>   File "Impala/tests/beeswax/impala_beeswax.py", line 519, in __do_rpc
> raise ImpalaBeeswaxException(self.__build_error_message(b), b)
> ImpalaBeeswaxException: ImpalaBeeswaxException:
>  INNER EXCEPTION:  {code}
> The corresponding error in HMS is:
> {code:java}
> 2019-11-22T06:36:59,937  INFO [pool-10-thread-13] metastore.HiveMetaStore: 
> 18: source:127.0.0.1 create_table_req: Table(tableName:child_table, 
> dbName:functional_seq_record_gzip, owner:jenkins, createTime:0, 
> lastAccessTime:0, retention:0, 
> sd:StorageDescriptor(cols:[FieldSchema(name:seq, type:int, comment:null), 
> FieldSchema(name:id, type:int, comment:null), FieldSchema(name:year, 
> type:string, comment:null), FieldSchema(name:a, type:int, comment:null)], 
> location:hdfs://localhost:20500/test-warehouse/child_table, 
> inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
> compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:null, 
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> parameters:{serialization.format=,, field.delim=,}), bucketCols:null, 
> sortCols:null, parameters:null), partitionKeys:[], parameters:{EXTERNAL=TRUE, 
> OBJCAPABILITIES=EXTREAD,EXTWRITE}, viewOriginalText:null, 
> viewExpandedText:null, tableType:EXTERNAL_TABLE, catName:hive, 
> ownerType:USER, accessType:8)
> 2019-11-22T06:36:59,937  INFO [pool-10-thread-13] HiveMetaStore.audit: 
> ugi=jenkins  ip=127.0.0.1cmd=source:127.0.0.1 create_table_req: 
> Table(tableName:child_table, dbName:functional_seq_record_gzip, 
> owner:jenkins, createTime:0, lastAccessTime:0, retention:0, 
> sd:StorageDescriptor(cols:[FieldSchema(name:seq, type:int, comment:null), 
> FieldSchema(name:id, type:int, comment:null), FieldSchema(name:year, 
> type:string, comment:null), FieldSchema(name:a, type:int, comment:null)], 
> location:hdfs://localhost:20500/test-warehouse/child_table, 
> inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
> compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:null, 
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> parameters:{serialization.format=,, field.delim=,}), bucketCols:null, 
> sortCols:null, parameters:null), partitionKeys:[], parameters:{EXTERNAL=TRUE, 
> OBJCAPABILITIES=EXTREAD,EXTWRITE}, viewOriginalText:null, 
> viewExpandedText:null, tableType:EXTERNAL_TABLE, catName:hive, 
> ownerType:USER, accessType:8)
> 2019-11-22T06:36:59,937  INFO [pool-10-thread-13] 
> metastore.MetastoreDefaultTransformer: Starting translation for CreateTable 
> for processor Impala3.4.0-SNAPSHOT@localhost with [EXTWRITE, EXTREAD, 
> HIVEMANAGEDINSERTREAD, HIVEMANAGEDINSERTWRITE, HIVESQL, HIVEMQT, HIVEBUCKET2] 
> on table child_table
> 2019-11-22T06:36:59,937  INFO [pool-10-thread-13] 
> metastore.MetastoreDefaultTransformer: Table to be created is of type 
> 

[jira] [Updated] (IMPALA-9188) Dataload is failing when USE_CDP_HIVE=true

2019-11-22 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated IMPALA-9188:
-
Priority: Critical  (was: Major)

> Dataload is failing when USE_CDP_HIVE=true
> --
>
> Key: IMPALA-9188
> URL: https://issues.apache.org/jira/browse/IMPALA-9188
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Anurag Mantripragada
>Priority: Critical
>
> When USE_CDP_HIVE=true, Impala builds are failing during dataload when 
> creating tables with PK/FK constraints.
> The error is:
> {code:java}
> ERROR: CREATE EXTERNAL TABLE IF NOT EXISTS 
> functional_seq_record_snap.child_table (
> seq int, id int, year string, a int, primary key(seq) DISABLE NOVALIDATE 
> RELY, foreign key
> (id, year) references functional_seq_record_snap.parent_table(id, year) 
> DISABLE NOVALIDATE
> RELY, foreign key(a) references functional_seq_record_snap.parent_table_2(a) 
> DISABLE
> NOVALIDATE RELY)
> row format delimited fields terminated by ','
> LOCATION '/test-warehouse/child_table'
> Traceback (most recent call last):
>   File "Impala/bin/load-data.py", line 208, in exec_impala_query_from_file
> result = impala_client.execute(query)
>   File "Impala/tests/beeswax/impala_beeswax.py", line 187, in execute
> handle = self.__execute_query(query_string.strip(), user=user)
>   File "Impala/tests/beeswax/impala_beeswax.py", line 362, in __execute_query
> handle = self.execute_query_async(query_string, user=user)
>   File "Impala/tests/beeswax/impala_beeswax.py", line 356, in 
> execute_query_async
> handle = self.__do_rpc(lambda: self.imp_service.query(query,))
>   File "Impala/tests/beeswax/impala_beeswax.py", line 519, in __do_rpc
> raise ImpalaBeeswaxException(self.__build_error_message(b), b)
> ImpalaBeeswaxException: ImpalaBeeswaxException:
>  INNER EXCEPTION:  {code}
> The corresponding error in HMS is:
> {code:java}
> 2019-11-22T06:36:59,937  INFO [pool-10-thread-13] metastore.HiveMetaStore: 
> 18: source:127.0.0.1 create_table_req: Table(tableName:child_table, 
> dbName:functional_seq_record_gzip, owner:jenkins, createTime:0, 
> lastAccessTime:0, retention:0, 
> sd:StorageDescriptor(cols:[FieldSchema(name:seq, type:int, comment:null), 
> FieldSchema(name:id, type:int, comment:null), FieldSchema(name:year, 
> type:string, comment:null), FieldSchema(name:a, type:int, comment:null)], 
> location:hdfs://localhost:20500/test-warehouse/child_table, 
> inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
> compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:null, 
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> parameters:{serialization.format=,, field.delim=,}), bucketCols:null, 
> sortCols:null, parameters:null), partitionKeys:[], parameters:{EXTERNAL=TRUE, 
> OBJCAPABILITIES=EXTREAD,EXTWRITE}, viewOriginalText:null, 
> viewExpandedText:null, tableType:EXTERNAL_TABLE, catName:hive, 
> ownerType:USER, accessType:8)
> 2019-11-22T06:36:59,937  INFO [pool-10-thread-13] HiveMetaStore.audit: 
> ugi=jenkins  ip=127.0.0.1cmd=source:127.0.0.1 create_table_req: 
> Table(tableName:child_table, dbName:functional_seq_record_gzip, 
> owner:jenkins, createTime:0, lastAccessTime:0, retention:0, 
> sd:StorageDescriptor(cols:[FieldSchema(name:seq, type:int, comment:null), 
> FieldSchema(name:id, type:int, comment:null), FieldSchema(name:year, 
> type:string, comment:null), FieldSchema(name:a, type:int, comment:null)], 
> location:hdfs://localhost:20500/test-warehouse/child_table, 
> inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
> compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:null, 
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> parameters:{serialization.format=,, field.delim=,}), bucketCols:null, 
> sortCols:null, parameters:null), partitionKeys:[], parameters:{EXTERNAL=TRUE, 
> OBJCAPABILITIES=EXTREAD,EXTWRITE}, viewOriginalText:null, 
> viewExpandedText:null, tableType:EXTERNAL_TABLE, catName:hive, 
> ownerType:USER, accessType:8)
> 2019-11-22T06:36:59,937  INFO [pool-10-thread-13] 
> metastore.MetastoreDefaultTransformer: Starting translation for CreateTable 
> for processor Impala3.4.0-SNAPSHOT@localhost with [EXTWRITE, EXTREAD, 
> HIVEMANAGEDINSERTREAD, HIVEMANAGEDINSERTWRITE, HIVESQL, HIVEMQT, HIVEBUCKET2] 
> on table child_table
> 2019-11-22T06:36:59,937  INFO [pool-10-thread-13] 
> metastore.MetastoreDefaultTransformer: Table to be created is of type 
> EXTERNAL_TABLE but not MANAGED_TABLE
> 2019-11-22T06:36:59,937  INFO [pool-10-thread-13] 
> metastore.MetastoreDefaultTransformer: Transformer returning 
> 

[jira] [Resolved] (IMPALA-9109) Create Catalog debug page top-k average table loading ranking

2019-11-22 Thread Jiawei Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiawei Wang resolved IMPALA-9109.
-
  Assignee: Jiawei Wang  (was: Dinesh Garg)
Resolution: Fixed

> Create Catalog debug page top-k average table loading ranking
> -
>
> Key: IMPALA-9109
> URL: https://issues.apache.org/jira/browse/IMPALA-9109
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Catalog, Frontend
>Reporter: Jiawei Wang
>Assignee: Jiawei Wang
>Priority: Critical
>
> Right now we have top-k table with memory requirements, numbers of operations 
> and most numbers of files. It would be great if we can also have a ranking of 
> table metadata loading time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-9109) Create Catalog debug page top-k average table loading ranking

2019-11-22 Thread Jiawei Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiawei Wang resolved IMPALA-9109.
-
  Assignee: Jiawei Wang  (was: Dinesh Garg)
Resolution: Fixed

> Create Catalog debug page top-k average table loading ranking
> -
>
> Key: IMPALA-9109
> URL: https://issues.apache.org/jira/browse/IMPALA-9109
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Catalog, Frontend
>Reporter: Jiawei Wang
>Assignee: Jiawei Wang
>Priority: Critical
>
> Right now we have top-k table with memory requirements, numbers of operations 
> and most numbers of files. It would be great if we can also have a ranking of 
> table metadata loading time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9110) Add table loading time break-down metrics for HdfsTable

2019-11-22 Thread Jiawei Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiawei Wang resolved IMPALA-9110.
-
  Assignee: Jiawei Wang  (was: Dinesh Garg)
Resolution: Fixed

> Add table loading time break-down metrics for HdfsTable
> ---
>
> Key: IMPALA-9110
> URL: https://issues.apache.org/jira/browse/IMPALA-9110
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Catalog, Frontend
>Reporter: Jiawei Wang
>Assignee: Jiawei Wang
>Priority: Critical
>
> We are only able to get total table loading time right now, which makes it 
> really hard for us to debug why sometimes table loading is slow. Therefore, 
> it would be good to have a break-down metrics on how much time each function 
> cost when loading tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9110) Add table loading time break-down metrics for HdfsTable

2019-11-22 Thread Jiawei Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiawei Wang resolved IMPALA-9110.
-
  Assignee: Jiawei Wang  (was: Dinesh Garg)
Resolution: Fixed

> Add table loading time break-down metrics for HdfsTable
> ---
>
> Key: IMPALA-9110
> URL: https://issues.apache.org/jira/browse/IMPALA-9110
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Catalog, Frontend
>Reporter: Jiawei Wang
>Assignee: Jiawei Wang
>Priority: Critical
>
> We are only able to get total table loading time right now, which makes it 
> really hard for us to debug why sometimes table loading is slow. Therefore, 
> it would be good to have a break-down metrics on how much time each function 
> cost when loading tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-9188) Dataload is failing when USE_CDP_HIVE=true

2019-11-22 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-9188:


 Summary: Dataload is failing when USE_CDP_HIVE=true
 Key: IMPALA-9188
 URL: https://issues.apache.org/jira/browse/IMPALA-9188
 Project: IMPALA
  Issue Type: Bug
Reporter: Sahil Takiar
Assignee: Anurag Mantripragada


When USE_CDP_HIVE=true, Impala builds are failing during dataload when creating 
tables with PK/FK constraints.

The error is:
{code:java}
ERROR: CREATE EXTERNAL TABLE IF NOT EXISTS 
functional_seq_record_snap.child_table (
seq int, id int, year string, a int, primary key(seq) DISABLE NOVALIDATE RELY, 
foreign key
(id, year) references functional_seq_record_snap.parent_table(id, year) DISABLE 
NOVALIDATE
RELY, foreign key(a) references functional_seq_record_snap.parent_table_2(a) 
DISABLE
NOVALIDATE RELY)
row format delimited fields terminated by ','
LOCATION '/test-warehouse/child_table'
Traceback (most recent call last):
  File "Impala/bin/load-data.py", line 208, in exec_impala_query_from_file
result = impala_client.execute(query)
  File "Impala/tests/beeswax/impala_beeswax.py", line 187, in execute
handle = self.__execute_query(query_string.strip(), user=user)
  File "Impala/tests/beeswax/impala_beeswax.py", line 362, in __execute_query
handle = self.execute_query_async(query_string, user=user)
  File "Impala/tests/beeswax/impala_beeswax.py", line 356, in 
execute_query_async
handle = self.__do_rpc(lambda: self.imp_service.query(query,))
  File "Impala/tests/beeswax/impala_beeswax.py", line 519, in __do_rpc
raise ImpalaBeeswaxException(self.__build_error_message(b), b)
ImpalaBeeswaxException: ImpalaBeeswaxException:
 INNER EXCEPTION:  {code}

The corresponding error in HMS is:
{code:java}
2019-11-22T06:36:59,937  INFO [pool-10-thread-13] metastore.HiveMetaStore: 18: 
source:127.0.0.1 create_table_req: Table(tableName:child_table, 
dbName:functional_seq_record_gzip, owner:jenkins, createTime:0, 
lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:seq, 
type:int, comment:null), FieldSchema(name:id, type:int, comment:null), 
FieldSchema(name:year, type:string, comment:null), FieldSchema(name:a, 
type:int, comment:null)], 
location:hdfs://localhost:20500/test-warehouse/child_table, 
inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
parameters:{serialization.format=,, field.delim=,}), bucketCols:null, 
sortCols:null, parameters:null), partitionKeys:[], parameters:{EXTERNAL=TRUE, 
OBJCAPABILITIES=EXTREAD,EXTWRITE}, viewOriginalText:null, 
viewExpandedText:null, tableType:EXTERNAL_TABLE, catName:hive, ownerType:USER, 
accessType:8)
2019-11-22T06:36:59,937  INFO [pool-10-thread-13] HiveMetaStore.audit: 
ugi=jenkins  ip=127.0.0.1cmd=source:127.0.0.1 create_table_req: 
Table(tableName:child_table, dbName:functional_seq_record_gzip, owner:jenkins, 
createTime:0, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:seq, type:int, comment:null), 
FieldSchema(name:id, type:int, comment:null), FieldSchema(name:year, 
type:string, comment:null), FieldSchema(name:a, type:int, comment:null)], 
location:hdfs://localhost:20500/test-warehouse/child_table, 
inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
parameters:{serialization.format=,, field.delim=,}), bucketCols:null, 
sortCols:null, parameters:null), partitionKeys:[], parameters:{EXTERNAL=TRUE, 
OBJCAPABILITIES=EXTREAD,EXTWRITE}, viewOriginalText:null, 
viewExpandedText:null, tableType:EXTERNAL_TABLE, catName:hive, ownerType:USER, 
accessType:8)
2019-11-22T06:36:59,937  INFO [pool-10-thread-13] 
metastore.MetastoreDefaultTransformer: Starting translation for CreateTable for 
processor Impala3.4.0-SNAPSHOT@localhost with [EXTWRITE, EXTREAD, 
HIVEMANAGEDINSERTREAD, HIVEMANAGEDINSERTWRITE, HIVESQL, HIVEMQT, HIVEBUCKET2] 
on table child_table
2019-11-22T06:36:59,937  INFO [pool-10-thread-13] 
metastore.MetastoreDefaultTransformer: Table to be created is of type 
EXTERNAL_TABLE but not MANAGED_TABLE
2019-11-22T06:36:59,937  INFO [pool-10-thread-13] 
metastore.MetastoreDefaultTransformer: Transformer returning 
table:Table(tableName:child_table, dbName:functional_seq_record_gzip, 
owner:jenkins, createTime:0, lastAccessTime:0, retention:0, 
sd:StorageDescriptor(cols:[FieldSchema(name:seq, type:int, comment:null), 
FieldSchema(name:id, type:int, comment:null), FieldSchema(name:year, 
type:string, comment:null), FieldSchema(name:a, type:int, comment:null)], 

[jira] [Resolved] (IMPALA-9100) tests/run-tests.py should handle duplicate --skip-stress flags

2019-11-22 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-9100.
---
Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> tests/run-tests.py should handle duplicate --skip-stress flags
> --
>
> Key: IMPALA-9100
> URL: https://issues.apache.org/jira/browse/IMPALA-9100
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.4.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Critical
> Fix For: Impala 3.4.0
>
>
> If you pass --skip-stress multiple times to tests/run-tests.py, only the 
> first is removed before the arguments are passed to pytest:
> {noformat}
>   skip_stress = '--skip-stress' in sys.argv
>   if skip_stress:
> sys.argv.remove("--skip-stress"){noformat}
> This is also true for skip_serial and skip_parallel.
> This matters for the docker-based tests, because the docker-based tests run 
> the serial end-to-end tests with --skip-stress specified. run-all-tests.sh 
> also adds a --skip-stress argument when running core tests:
> {noformat}
> if [[ "${EXPLORATION_STRATEGY}" == "core" ]]; then
>   # Skip the stress test in core - all stress tests are in exhaustive and
>   # pytest startup takes a significant amount of time.
>   RUN_TESTS_ARGS+=" --skip-stress"
> fi{noformat}
> Only one skip-stress is removed, and the other one gets passed to pytest, 
> which immediately fails without running tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IMPALA-9100) tests/run-tests.py should handle duplicate --skip-stress flags

2019-11-22 Thread Joe McDonnell (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-9100.
---
Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> tests/run-tests.py should handle duplicate --skip-stress flags
> --
>
> Key: IMPALA-9100
> URL: https://issues.apache.org/jira/browse/IMPALA-9100
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.4.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Critical
> Fix For: Impala 3.4.0
>
>
> If you pass --skip-stress multiple times to tests/run-tests.py, only the 
> first is removed before the arguments are passed to pytest:
> {noformat}
>   skip_stress = '--skip-stress' in sys.argv
>   if skip_stress:
> sys.argv.remove("--skip-stress"){noformat}
> This is also true for skip_serial and skip_parallel.
> This matters for the docker-based tests, because the docker-based tests run 
> the serial end-to-end tests with --skip-stress specified. run-all-tests.sh 
> also adds a --skip-stress argument when running core tests:
> {noformat}
> if [[ "${EXPLORATION_STRATEGY}" == "core" ]]; then
>   # Skip the stress test in core - all stress tests are in exhaustive and
>   # pytest startup takes a significant amount of time.
>   RUN_TESTS_ARGS+=" --skip-stress"
> fi{noformat}
> Only one skip-stress is removed, and the other one gets passed to pytest, 
> which immediately fails without running tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9100) tests/run-tests.py should handle duplicate --skip-stress flags

2019-11-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980437#comment-16980437
 ] 

ASF subversion and git services commented on IMPALA-9100:
-

Commit d747cc3646511531d6ae4479ec17012c58a596f4 in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=d747cc3 ]

IMPALA-9100: Handle duplicate occurrences of flags for tests/run-tests.py

If someone passes --skip-stress multiple times to tests/run-tests.py,
it currently only removes one of the occurrences from the arguments
and allows the other one to pass through to pytest. This causes pytest
to immediately error out. This behavior is seen on the docker-based
tests, because test-with-docker.py specifies --skip-stress and
bin/run-all-tests.sh adds another --skip-stress for core runs.

This changes tests/run-tests.py to handle multiple occurrences of
--skip-stress, --skip-parallel, and --skip-serial.

Testing:
 - Tested manually with duplicate skip flags.

Change-Id: I60dc9a898f69804e2a53c05b5dfab2f948a22097
Reviewed-on: http://gerrit.cloudera.org:8080/14629
Reviewed-by: Joe McDonnell 
Tested-by: Impala Public Jenkins 


> tests/run-tests.py should handle duplicate --skip-stress flags
> --
>
> Key: IMPALA-9100
> URL: https://issues.apache.org/jira/browse/IMPALA-9100
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.4.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Critical
>
> If you pass --skip-stress multiple times to tests/run-tests.py, only the 
> first is removed before the arguments are passed to pytest:
> {noformat}
>   skip_stress = '--skip-stress' in sys.argv
>   if skip_stress:
> sys.argv.remove("--skip-stress"){noformat}
> This is also true for skip_serial and skip_parallel.
> This matters for the docker-based tests, because the docker-based tests run 
> the serial end-to-end tests with --skip-stress specified. run-all-tests.sh 
> also adds a --skip-stress argument when running core tests:
> {noformat}
> if [[ "${EXPLORATION_STRATEGY}" == "core" ]]; then
>   # Skip the stress test in core - all stress tests are in exhaustive and
>   # pytest startup takes a significant amount of time.
>   RUN_TESTS_ARGS+=" --skip-stress"
> fi{noformat}
> Only one skip-stress is removed, and the other one gets passed to pytest, 
> which immediately fails without running tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9110) Add table loading time break-down metrics for HdfsTable

2019-11-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980438#comment-16980438
 ] 

ASF subversion and git services commented on IMPALA-9110:
-

Commit 65198faa3beeea13aec905f8cda8f644e99af960 in impala's branch 
refs/heads/master from Jiawei Wang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=65198fa ]

IMPALA-9110: Add table loading time break-down metrics for HdfsTable

A. Problem:
Catalog table loading currently only records the total loading
time. We will need some break-down times, i.e. more detailed
time recording on each loading function. Also, the table schema
loading is not taken into account for load-duration. We will need
to add some more metrics for that.

B. Solution:
- We added "hms-load-tbl-schema", "load-duration.all-column-stats",
"load-duration.all-partitions.total-time",
"load-duration.all-partitions.file-metadata".
Also, we logged the loadValidWriteIdList() time. So now we have
a more detailed breakdown time for table loading info.

The table loading time metrics for HDFS tables are in the following hierarchy:
- Table Schema Loading
- Table Metadata Loading - total time
- all column stats loading time
- ValidWriteIds loading time
- all partitions loading time - total time
- file metadata loading time
- storage-metadata-loading-time(standalone metric)

1. Table Schema Loading:
* Meaning: The time for HMS to fetch table object and the real schema loading 
time.
Normally, the code path is "msClient.getHiveClient().getTable(dbName, tblName)"
* Metric : hms-load-tbl-schema

2. Table Metadata Loading -- total time
* Meaning: The time to load all the table metadata.
The code path is load() function in HdfsTable.load() function.
* Metric: load-duration.total-time

2.1 Table Metadata Loading -- all column stats
* Meaning: load all column stats, this is part of table metadata loading
The code path is HdfsTable.loadAllColumnStats()
* Metric: load-duration.all-column-stats

2.2 Table Metadata Loading -- loadValidWriteIdList
* Meaning: fetch ValidWriteIds from HMS
The code path is HdfsTable.loadValidWriteIdList()
* Metric: no metric recorded for this one. Instead, a debug log is
generated.

2.3 Table Metadata Loading -- storage metadata loading(standalone metric)
* Meaning: Storage related to file system operations during metadata
loading.(The amount of time spent loading metadata from the underlying storage 
layer.)
* Metric: we rename it to load-duration.storage-metadata. This is a metric 
introduced by
IMPALA-7322

2.4 Table Metadata Loading -- load all partitions
* Meaning: Load all partitions time, including fetching all partitions
from HMS and loading all partitions. The code path is
MetaStoreUtil.fetchAllPartitions() and HdfsTable.loadAllPartitions()
* Metric: load-duration.all-partitions

2.4.1 Table Metadata Loading -- load all partitions -- load file metadata
* Meaning: The file metadata loading for all all partitions. (This is
part of 2.4). Code path: loadFileMetadataForPartitions() inside
loadAllPartitions()
* Metric: load-duration.all-partitions.file-metadata

C. Extra thing in this commit:
1. Add PrintUtils.printTimeNs for PrettyPrint time in FrontEnd
2. Add explanation for table loading manager

D. Test:
1. Add Unit tests for PrintUtils.printTime() function
2. Manual describe table and verify the table loading metrics are
correct.

Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Reviewed-on: http://gerrit.cloudera.org:8080/14611
Reviewed-by: Vihang Karajgaonkar 
Tested-by: Impala Public Jenkins 


> Add table loading time break-down metrics for HdfsTable
> ---
>
> Key: IMPALA-9110
> URL: https://issues.apache.org/jira/browse/IMPALA-9110
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Catalog, Frontend
>Reporter: Jiawei Wang
>Assignee: Dinesh Garg
>Priority: Critical
>
> We are only able to get total table loading time right now, which makes it 
> really hard for us to debug why sometimes table loading is slow. Therefore, 
> it would be good to have a break-down metrics on how much time each function 
> cost when loading tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7322) Add storage wait time to profile for operations with metadata load

2019-11-22 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-7322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980439#comment-16980439
 ] 

ASF subversion and git services commented on IMPALA-7322:
-

Commit 65198faa3beeea13aec905f8cda8f644e99af960 in impala's branch 
refs/heads/master from Jiawei Wang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=65198fa ]

IMPALA-9110: Add table loading time break-down metrics for HdfsTable

A. Problem:
Catalog table loading currently only records the total loading
time. We will need some break-down times, i.e. more detailed
time recording on each loading function. Also, the table schema
loading is not taken into account for load-duration. We will need
to add some more metrics for that.

B. Solution:
- We added "hms-load-tbl-schema", "load-duration.all-column-stats",
"load-duration.all-partitions.total-time",
"load-duration.all-partitions.file-metadata".
Also, we logged the loadValidWriteIdList() time. So now we have
a more detailed breakdown time for table loading info.

The table loading time metrics for HDFS tables are in the following hierarchy:
- Table Schema Loading
- Table Metadata Loading - total time
- all column stats loading time
- ValidWriteIds loading time
- all partitions loading time - total time
- file metadata loading time
- storage-metadata-loading-time(standalone metric)

1. Table Schema Loading:
* Meaning: The time for HMS to fetch table object and the real schema loading 
time.
Normally, the code path is "msClient.getHiveClient().getTable(dbName, tblName)"
* Metric : hms-load-tbl-schema

2. Table Metadata Loading -- total time
* Meaning: The time to load all the table metadata.
The code path is load() function in HdfsTable.load() function.
* Metric: load-duration.total-time

2.1 Table Metadata Loading -- all column stats
* Meaning: load all column stats, this is part of table metadata loading
The code path is HdfsTable.loadAllColumnStats()
* Metric: load-duration.all-column-stats

2.2 Table Metadata Loading -- loadValidWriteIdList
* Meaning: fetch ValidWriteIds from HMS
The code path is HdfsTable.loadValidWriteIdList()
* Metric: no metric recorded for this one. Instead, a debug log is
generated.

2.3 Table Metadata Loading -- storage metadata loading(standalone metric)
* Meaning: Storage related to file system operations during metadata
loading.(The amount of time spent loading metadata from the underlying storage 
layer.)
* Metric: we rename it to load-duration.storage-metadata. This is a metric 
introduced by
IMPALA-7322

2.4 Table Metadata Loading -- load all partitions
* Meaning: Load all partitions time, including fetching all partitions
from HMS and loading all partitions. The code path is
MetaStoreUtil.fetchAllPartitions() and HdfsTable.loadAllPartitions()
* Metric: load-duration.all-partitions

2.4.1 Table Metadata Loading -- load all partitions -- load file metadata
* Meaning: The file metadata loading for all all partitions. (This is
part of 2.4). Code path: loadFileMetadataForPartitions() inside
loadAllPartitions()
* Metric: load-duration.all-partitions.file-metadata

C. Extra thing in this commit:
1. Add PrintUtils.printTimeNs for PrettyPrint time in FrontEnd
2. Add explanation for table loading manager

D. Test:
1. Add Unit tests for PrintUtils.printTime() function
2. Manual describe table and verify the table loading metrics are
correct.

Change-Id: I5381f9316df588b2004876c6cd9fb7e674085b10
Reviewed-on: http://gerrit.cloudera.org:8080/14611
Reviewed-by: Vihang Karajgaonkar 
Tested-by: Impala Public Jenkins 


> Add storage wait time to profile for operations with metadata load
> --
>
> Key: IMPALA-7322
> URL: https://issues.apache.org/jira/browse/IMPALA-7322
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.0, Impala 2.12.0
>Reporter: Balazs Jeszenszky
>Assignee: Yongzhi Chen
>Priority: Major
> Fix For: Impala 3.4.0
>
>
> The profile of a REFRESH or of the query triggering metadata load should 
> point out how much time was spent waiting for source systems.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-3343) Impala-shell compatibility with python 3

2019-11-22 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-3343:
--
Target Version: Impala 3.4.0  (was: Product Backlog)

> Impala-shell compatibility with python 3
> 
>
> Key: IMPALA-3343
> URL: https://issues.apache.org/jira/browse/IMPALA-3343
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 2.5.0
>Reporter: Peter Ebert
>Assignee: David Knupp
>Priority: Critical
>
> Installed Anaconda package and python 3, Impala shell has errors and will not 
> run in the python 3 environment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-3343) Impala-shell compatibility with python 3

2019-11-22 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-3343:
--
Priority: Critical  (was: Minor)

> Impala-shell compatibility with python 3
> 
>
> Key: IMPALA-3343
> URL: https://issues.apache.org/jira/browse/IMPALA-3343
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 2.5.0
>Reporter: Peter Ebert
>Assignee: David Knupp
>Priority: Critical
>
> Installed Anaconda package and python 3, Impala shell has errors and will not 
> run in the python 3 environment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8991) Data loading failed in Hive with error initializing MapReduce cluster

2019-11-22 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980435#comment-16980435
 ] 

Sahil Takiar commented on IMPALA-8991:
--

Saw this happen again in 
https://jenkins.impala.io/job/parallel-all-tests-nightly/1106/

> Data loading failed in Hive with error initializing MapReduce cluster
> -
>
> Key: IMPALA-8991
> URL: https://issues.apache.org/jira/browse/IMPALA-8991
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Michael Ho
>Priority: Major
>
> Data loading with {noformat} insert into table functional_seq_gzip.a 
> ^Mlltypesaggmultifilesnopart SELECT id, bool_col, tinyint_col, smallint_col, 
> int_c ^Mol, bigint_col, float_col, double_col, date_string_col, string_col, 
> timestamp_co ^Ml FROM functional.alltypesaggmultifilesnopa
> rt where id % 4 = 2; {noformat} failed in Hive with unexpected exception:
> {noformat}
> java.io.IOException: Cannot initialize Cluster. Please check your 
> configuration for mapreduce.framework.name and the correspond server 
> addresses.
> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:116)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:109)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:102)
> at org.apache.hadoop.mapred.JobClient.init(JobClient.java:475)
> at org.apache.hadoop.mapred.JobClient.(JobClient.java:454)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:402)
> at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2200)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1843)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1563)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1339)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1334)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$600(SQLOperation.java:92)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:357)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Suppressed: java.io.IOException: Failed to use 
> org.apache.hadoop.mapred.LocalClientProtocolProvider due to error:
> at 
> org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:148)
> ... 25 more
> Caused by: org.apache.hadoop.metrics2.MetricsException: Metrics 
> source LocalJobRunnerMetrics-1803220940 already exists!
> at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:152)
> at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:125)
> at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
> at 
> org.apache.hadoop.mapred.LocalJobRunnerMetrics.create(LocalJobRunnerMetrics.java:46)
> at 
> org.apache.hadoop.mapred.LocalJobRunner.(LocalJobRunner.java:777)
> at 
> org.apache.hadoop.mapred.LocalJobRunner.(LocalJobRunner.java:770)
> at 
> org.apache.hadoop.mapred.LocalClientProtocolProvider.create(LocalClientProtocolProvider.java:42)
> at 
> org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:130)
> ... 25 more
> Number of reduce tasks is set to 0 since there's no reduce operator
> Job Submission failed with exception 'java.io.IOException(Cannot initialize 
> Cluster. Please check your configuration for mapreduce.framework.name and the 
> correspond server addresses.)'
> FAILED: Execution Error, return code 1 from 
> 

[jira] [Commented] (IMPALA-9157) TestAuthorizationProvider.test_invalid_provider_flag fails due to Python 2.6 incompatible code

2019-11-22 Thread David Knupp (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980419#comment-16980419
 ] 

David Knupp commented on IMPALA-9157:
-

Patch: https://gerrit.cloudera.org/c/14788/

> TestAuthorizationProvider.test_invalid_provider_flag fails due to Python 2.6 
> incompatible code
> --
>
> Key: IMPALA-9157
> URL: https://issues.apache.org/jira/browse/IMPALA-9157
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.4.0
>Reporter: Joe McDonnell
>Assignee: David Knupp
>Priority: Blocker
>  Labels: broken-build
>
> Our Centos 6 builds use Python 2.6, which means that it doesn't have 
> check_output (added in Python 2.7). This causes test failures in 
> test_provider.py:
>  
> {noformat}
> authorization/test_provider.py:70: in setup_method
> self.pre_test_cores = set([f for f in possible_cores if is_core_dump(f)])
> ../lib/python/impala_py_lib/helpers.py:64: in is_core_dump
> file_std_out = exec_local_command("file %s" % file_path)
> ../lib/python/impala_py_lib/helpers.py:34: in exec_local_command
> return subprocess.check_output(cmd.split())
> E   AttributeError: 'module' object has no attribute 'check_output'{noformat}
> This comes from the new code to handle intentional core dumps:
>  
> [https://github.com/apache/impala/blob/master/lib/python/impala_py_lib/helpers.py#L34]
> {noformat}
> def exec_local_command(cmd):
>   """  Executes a command for the local bash shell and return stdout as a 
> string.
>   Args:
> cmd: command as a string
>   Return:
> STDOUT
>   """
>   return subprocess.check_output(cmd.split()){noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-9157) TestAuthorizationProvider.test_invalid_provider_flag fails due to Python 2.6 incompatible code

2019-11-22 Thread David Knupp (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-9157 started by David Knupp.
---
> TestAuthorizationProvider.test_invalid_provider_flag fails due to Python 2.6 
> incompatible code
> --
>
> Key: IMPALA-9157
> URL: https://issues.apache.org/jira/browse/IMPALA-9157
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.4.0
>Reporter: Joe McDonnell
>Assignee: David Knupp
>Priority: Blocker
>  Labels: broken-build
>
> Our Centos 6 builds use Python 2.6, which means that it doesn't have 
> check_output (added in Python 2.7). This causes test failures in 
> test_provider.py:
>  
> {noformat}
> authorization/test_provider.py:70: in setup_method
> self.pre_test_cores = set([f for f in possible_cores if is_core_dump(f)])
> ../lib/python/impala_py_lib/helpers.py:64: in is_core_dump
> file_std_out = exec_local_command("file %s" % file_path)
> ../lib/python/impala_py_lib/helpers.py:34: in exec_local_command
> return subprocess.check_output(cmd.split())
> E   AttributeError: 'module' object has no attribute 'check_output'{noformat}
> This comes from the new code to handle intentional core dumps:
>  
> [https://github.com/apache/impala/blob/master/lib/python/impala_py_lib/helpers.py#L34]
> {noformat}
> def exec_local_command(cmd):
>   """  Executes a command for the local bash shell and return stdout as a 
> string.
>   Args:
> cmd: command as a string
>   Return:
> STDOUT
>   """
>   return subprocess.check_output(cmd.split()){noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9187) TestExecutorGroups.test_executor_group_shutdown is flaky

2019-11-22 Thread Sahil Takiar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980408#comment-16980408
 ] 

Sahil Takiar commented on IMPALA-9187:
--

Looking at the test, I think there is a race condition. The query can still be 
in the COMPILED state when the assert is hit: {{assert "Initial admission queue 
reason: number of running queries" in profile, profile}}, in which case it 
hasn't been submitted for admission control yet.

> TestExecutorGroups.test_executor_group_shutdown is flaky
> 
>
> Key: IMPALA-9187
> URL: https://issues.apache.org/jira/browse/IMPALA-9187
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Lars Volker
>Priority: Major
>  Labels: broken-build, flaky
>
> The following test is flaky:
> custom_cluster.test_executor_groups.TestExecutorGroups.test_executor_group_shutdown
>  (from pytest)
> Error Message
> {code}
> AssertionError: Query (id=6c4bb1c6f501bae4:ee491183): DEBUG MODE 
> WARNING: Query profile created while running a DEBUG build of Impala. Use 
> RELEASE builds to measure query performance. Summary: Session ID: 
> 104c00e26afad563:fad6988e52bf9cba Session Type: BEESWAX Start Time: 
> 2019-11-22 00:19:26.497324000 End Time: Query Type: QUERY Query State: 
> COMPILED Query Status: OK Impala Version: impalad version 3.4.0-SNAPSHOT 
> DEBUG (build 2bdca39a8b178b7186dd24141a8e97fa0c46358f) User: jenkins 
> Connected User: jenkins Delegated User: Network Address: 127.0.0.1:59977 
> Default Db: default Sql Statement: select sleep(3) Coordinator: []:22000 
> Query Options (set by configuration): 
> TIMEZONE=America/Los_Angeles,CLIENT_IDENTIFIER=custom_cluster/test_executor_groups.py::TestExecutorGroups::()::test_executor_group_shutdown
>  Query Options (set by configuration and planner): 
> NUM_NODES=1,NUM_SCANNER_THREADS=1,RUNTIME_FILTER_MODE=0,MT_DOP=0,TIMEZONE=America/Los_Angeles,CLIENT_IDENTIFIER=custom_cluster/test_executor_groups.py::TestExecutorGroups::()::test_executor_group_shutdown
>  Plan:  Max Per-Host Resource Reservation: Memory=0B 
> Threads=1 Per-Host Resource Estimates: Memory=10MB Dedicated Coordinator 
> Resource Estimate: Memory=100MB Codegen disabled by planner Analyzed query: 
> SELECT sleep(CAST(3 AS INT)) F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 
> instances=1 | Per-Host Resources: mem-estimate=0B mem-reservation=0B 
> thread-reservation=1 PLAN-ROOT SINK | output exprs: sleep(3) | 
> mem-estimate=0B mem-reservation=0B thread-reservation=0 | 00:UNION 
> constant-operands=1 mem-estimate=0B mem-reservation=0B thread-reservation=0 
> tuple-ids=0 row-size=1B cardinality=1 in pipelines:   
> Estimated Per-Host Mem: 10485760 Request Pool: default-pool Per Host Min 
> Memory Reservation: []:22000(0) Per Host Number of Fragment Instances: 
> []:22000(1) Admission result: Queued Query Compilation: 5.077ms - Metadata of 
> all 0 tables cached: 679.990us (679.990us) - Analysis finished: 1.269ms 
> (589.508us) - Authorization finished (noop): 1.350ms (81.387us) - Value 
> transfer graph computed: 1.681ms (330.356us) - Single node plan created: 
> 1.801ms (120.709us) - Distributed plan created: 1.880ms (78.868us) - Planning 
> finished: 5.077ms (3.196ms) Query Timeline: 11.000ms - Query submitted: 
> 0.000ns (0.000ns) - Planning finished: 7.000ms (7.000ms) - Submit for 
> admission: 9.000ms (2.000ms) - Queued: 11.000ms (2.000ms) - 
> ComputeScanRangeAssignmentTimer: 0.000ns Frontend: ImpalaServer: - 
> ClientFetchWaitTimer: 0.000ns - NumRowsFetched: 0 (0) - 
> NumRowsFetchedFromCache: 0 (0) - RowMaterializationRate: 0 - 
> RowMaterializationTimer: 0.000ns assert 'Initial admission queue reason: 
> number of running queries' in 'Query 
> (id=6c4bb1c6f501bae4:ee491183):\n DEBUG MODE WARNING: Query profile 
> created while running a DEBUG buil...0)\n - NumRowsFetchedFromCache: 0 (0)\n 
> - RowMaterializationRate: 0\n - RowMaterializationTimer: 0.000ns\n'
> {code}
> Stacktrace
> {code}
> custom_cluster/test_executor_groups.py:185: in test_executor_group_shutdown 
> assert "Initial admission queue reason: number of running queries" in 
> profile, profile E AssertionError: Query 
> (id=6c4bb1c6f501bae4:ee491183): E DEBUG MODE WARNING: Query profile 
> created while running a DEBUG build of Impala. Use RELEASE builds to measure 
> query performance. E Summary: E Session ID: 104c00e26afad563:fad6988e52bf9cba 
> E Session Type: BEESWAX E Start Time: 2019-11-22 00:19:26.497324000 E End 
> Time: E Query Type: QUERY E Query State: COMPILED E Query Status: OK
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org

[jira] [Created] (IMPALA-9187) TestExecutorGroups.test_executor_group_shutdown is flaky

2019-11-22 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-9187:


 Summary: TestExecutorGroups.test_executor_group_shutdown is flaky
 Key: IMPALA-9187
 URL: https://issues.apache.org/jira/browse/IMPALA-9187
 Project: IMPALA
  Issue Type: Bug
Reporter: Sahil Takiar
Assignee: Lars Volker


The following test is flaky:

custom_cluster.test_executor_groups.TestExecutorGroups.test_executor_group_shutdown
 (from pytest)

Error Message

{code}
AssertionError: Query (id=6c4bb1c6f501bae4:ee491183): DEBUG MODE 
WARNING: Query profile created while running a DEBUG build of Impala. Use 
RELEASE builds to measure query performance. Summary: Session ID: 
104c00e26afad563:fad6988e52bf9cba Session Type: BEESWAX Start Time: 2019-11-22 
00:19:26.497324000 End Time: Query Type: QUERY Query State: COMPILED Query 
Status: OK Impala Version: impalad version 3.4.0-SNAPSHOT DEBUG (build 
2bdca39a8b178b7186dd24141a8e97fa0c46358f) User: jenkins Connected User: jenkins 
Delegated User: Network Address: 127.0.0.1:59977 Default Db: default Sql 
Statement: select sleep(3) Coordinator: []:22000 Query Options (set by 
configuration): 
TIMEZONE=America/Los_Angeles,CLIENT_IDENTIFIER=custom_cluster/test_executor_groups.py::TestExecutorGroups::()::test_executor_group_shutdown
 Query Options (set by configuration and planner): 
NUM_NODES=1,NUM_SCANNER_THREADS=1,RUNTIME_FILTER_MODE=0,MT_DOP=0,TIMEZONE=America/Los_Angeles,CLIENT_IDENTIFIER=custom_cluster/test_executor_groups.py::TestExecutorGroups::()::test_executor_group_shutdown
 Plan:  Max Per-Host Resource Reservation: Memory=0B Threads=1 
Per-Host Resource Estimates: Memory=10MB Dedicated Coordinator Resource 
Estimate: Memory=100MB Codegen disabled by planner Analyzed query: SELECT 
sleep(CAST(3 AS INT)) F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 | 
Per-Host Resources: mem-estimate=0B mem-reservation=0B thread-reservation=1 
PLAN-ROOT SINK | output exprs: sleep(3) | mem-estimate=0B mem-reservation=0B 
thread-reservation=0 | 00:UNION constant-operands=1 mem-estimate=0B 
mem-reservation=0B thread-reservation=0 tuple-ids=0 row-size=1B cardinality=1 
in pipelines:   Estimated Per-Host Mem: 10485760 Request 
Pool: default-pool Per Host Min Memory Reservation: []:22000(0) Per Host Number 
of Fragment Instances: []:22000(1) Admission result: Queued Query Compilation: 
5.077ms - Metadata of all 0 tables cached: 679.990us (679.990us) - Analysis 
finished: 1.269ms (589.508us) - Authorization finished (noop): 1.350ms 
(81.387us) - Value transfer graph computed: 1.681ms (330.356us) - Single node 
plan created: 1.801ms (120.709us) - Distributed plan created: 1.880ms 
(78.868us) - Planning finished: 5.077ms (3.196ms) Query Timeline: 11.000ms - 
Query submitted: 0.000ns (0.000ns) - Planning finished: 7.000ms (7.000ms) - 
Submit for admission: 9.000ms (2.000ms) - Queued: 11.000ms (2.000ms) - 
ComputeScanRangeAssignmentTimer: 0.000ns Frontend: ImpalaServer: - 
ClientFetchWaitTimer: 0.000ns - NumRowsFetched: 0 (0) - 
NumRowsFetchedFromCache: 0 (0) - RowMaterializationRate: 0 - 
RowMaterializationTimer: 0.000ns assert 'Initial admission queue reason: number 
of running queries' in 'Query (id=6c4bb1c6f501bae4:ee491183):\n DEBUG 
MODE WARNING: Query profile created while running a DEBUG buil...0)\n - 
NumRowsFetchedFromCache: 0 (0)\n - RowMaterializationRate: 0\n - 
RowMaterializationTimer: 0.000ns\n'
{code}

Stacktrace

{code}
custom_cluster/test_executor_groups.py:185: in test_executor_group_shutdown 
assert "Initial admission queue reason: number of running queries" in profile, 
profile E AssertionError: Query (id=6c4bb1c6f501bae4:ee491183): E DEBUG 
MODE WARNING: Query profile created while running a DEBUG build of Impala. Use 
RELEASE builds to measure query performance. E Summary: E Session ID: 
104c00e26afad563:fad6988e52bf9cba E Session Type: BEESWAX E Start Time: 
2019-11-22 00:19:26.497324000 E End Time: E Query Type: QUERY E Query State: 
COMPILED E Query Status: OK
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-9186) NUMA-awareness for separate join build

2019-11-22 Thread Tim Armstrong (Jira)
Tim Armstrong created IMPALA-9186:
-

 Summary: NUMA-awareness for separate join build
 Key: IMPALA-9186
 URL: https://issues.apache.org/jira/browse/IMPALA-9186
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Reporter: Tim Armstrong


Currently joins are executed by a single thread, so generally the memory 
allocations will come from the local NUMA node (assuming the thread stays on 
the same NUMA node).

With the separate join build on a NUMA machine, some probe threads will likely 
be on a different NUMA node from the build, resulting in inefficiency. This may 
not be a major problem because of prefetching, etc, but filing this JIRA to 
track the potential issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IMPALA-9186) NUMA-awareness for separate join build

2019-11-22 Thread Tim Armstrong (Jira)
Tim Armstrong created IMPALA-9186:
-

 Summary: NUMA-awareness for separate join build
 Key: IMPALA-9186
 URL: https://issues.apache.org/jira/browse/IMPALA-9186
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Reporter: Tim Armstrong


Currently joins are executed by a single thread, so generally the memory 
allocations will come from the local NUMA node (assuming the thread stays on 
the same NUMA node).

With the separate join build on a NUMA machine, some probe threads will likely 
be on a different NUMA node from the build, resulting in inefficiency. This may 
not be a major problem because of prefetching, etc, but filing this JIRA to 
track the potential issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9165) Precommit jobs getting stuck in testdata/bin/create-hbase.sh

2019-11-22 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16980323#comment-16980323
 ] 

Csaba Ringhofer commented on IMPALA-9165:
-

[~joemcdonnell]
I saw this issue reoccuring in a recent build: 
https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/1683

I found a suspicious error in HBase's zookeeper logs:
{code}
19/11/22 10:04:01 INFO server.NIOServerCnxnFactory: binding to port 
0.0.0.0/0.0.0.0:2181
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:90)
at 
org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:119)
at 
org.apache.hadoop.hbase.zookeeper.HQuorumPeer.runZKServer(HQuorumPeer.java:94)
at 
org.apache.hadoop.hbase.zookeeper.HQuorumPeer.main(HQuorumPeer.java:78)
{code}
see 
https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/1683/artifact/Impala/logs_static/logs/cluster/hbase/hbase-ubuntu-zookeeper-ip-172-31-26-202.out/*view*/

I also saw this in another failed run:
https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/1681/artifact/Impala/logs_static/logs/cluster/hbase/hbase-ubuntu-zookeeper-ip-172-31-37-124.out/*view*/

But I didn't see it in healthy runs:
https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/1682/artifact/Impala/logs_static/logs/cluster/hbase/hbase-ubuntu-zookeeper-ip-172-31-26-202.out/*view*/
In this case listening on the same port 2181 was successful.



> Precommit jobs getting stuck in testdata/bin/create-hbase.sh
> 
>
> Key: IMPALA-9165
> URL: https://issues.apache.org/jira/browse/IMPALA-9165
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.4.0
>Reporter: Joe McDonnell
>Priority: Blocker
>  Labels: broken-build
>
> Several of our precommit jobs have been getting stuck and cancelled with this 
> log:
> {noformat}
> 01:54:19 /home/ubuntu/Impala/testdata/target
> 01:54:19 SUCCESS, data generated into /home/ubuntu/Impala/testdata/target
> 01:54:20 Executing: create-load-data.sh 
> 01:54:20 Generating HBase data (logging to 
> /home/ubuntu/Impala/logs/data_loading/create-hbase.log)... 
> 11:23:26 FATAL: Unable to delete script file 
> /tmp/jenkins4199210326564796706.sh
> 11:23:26 java.lang.InterruptedException
> 11:23:26  at java.lang.Object.wait(Native Method)
> 11:23:26  at hudson.remoting.Request.call(Request.java:177)
> 11:23:26  at hudson.remoting.Channel.call(Channel.java:956)
> 11:23:26  at hudson.FilePath.act(FilePath.java:1070)
> 11:23:26  at hudson.FilePath.act(FilePath.java:1059)
> 11:23:26  at hudson.FilePath.delete(FilePath.java:1540)
> 11:23:26  at 
> hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:123)
> 11:23:26  at 
> hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)
> 11:23:26  at 
> hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
> 11:23:26  at 
> hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:741)
> 11:23:26  at hudson.model.Build$BuildExecution.build(Build.java:206)
> 11:23:26  at hudson.model.Build$BuildExecution.doRun(Build.java:163)
> 11:23:26  at 
> hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504)
> 11:23:26  at hudson.model.Run.execute(Run.java:1818)
> 11:23:26  at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
> 11:23:26  at 
> hudson.model.ResourceController.execute(ResourceController.java:97)
> 11:23:26  at hudson.model.Executor.run(Executor.java:429){noformat}
> This indicates that the job is getting stuck inside 
> testdata/bin/create-hbase.sh. The logs are not available, because the job 
> gets cancelled.
> Some example jobs with this problem:
> [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/8847/]
> [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/8762/]
> [https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/8850/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-9167) stress/test_acid_stress.py gets TimeoutError on s3

2019-11-22 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-9167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-9167.
---
Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> stress/test_acid_stress.py gets TimeoutError on s3
> --
>
> Key: IMPALA-9167
> URL: https://issues.apache.org/jira/browse/IMPALA-9167
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.4.0
>Reporter: Joe McDonnell
>Assignee: Zoltán Borók-Nagy
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 3.4.0
>
>
> On s3, stress/test_acid_stress.py's 
> TestAcidInsertsBasic.test_read_hive_inserts() fails with the following error:
> {noformat}
> /data/jenkins/workspace/impala-cdpd-master-core-s3/repos/Impala/tests/stress/test_acid_stress.py:189:
>  in test_read_hive_inserts
> self._run_test_read_hive_inserts(unique_database, is_partitioned)
> /data/jenkins/workspace/impala-cdpd-master-core-s3/repos/Impala/tests/stress/test_acid_stress.py:167:
>  in _run_test_read_hive_inserts
> sleep_seconds=3)])
> /data/jenkins/workspace/impala-cdpd-master-core-s3/repos/Impala/tests/stress/test_acid_stress.py:51:
>  in run_tasks
> pool.map_async(Task.run, tasks).get(600)
> /usr/lib64/python2.7/multiprocessing/pool.py:550: in get
> raise TimeoutError
> E   TimeoutError{noformat}
> This looks like it was added in IMPALA-8648 
> ([https://github.com/apache/impala/commit/7ccfc43d963e4c36d065f1d912b52cf983e0595d])
> I have only seen this on s3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-9167) stress/test_acid_stress.py gets TimeoutError on s3

2019-11-22 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-9167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-9167.
---
Fix Version/s: Impala 3.4.0
   Resolution: Fixed

> stress/test_acid_stress.py gets TimeoutError on s3
> --
>
> Key: IMPALA-9167
> URL: https://issues.apache.org/jira/browse/IMPALA-9167
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.4.0
>Reporter: Joe McDonnell
>Assignee: Zoltán Borók-Nagy
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 3.4.0
>
>
> On s3, stress/test_acid_stress.py's 
> TestAcidInsertsBasic.test_read_hive_inserts() fails with the following error:
> {noformat}
> /data/jenkins/workspace/impala-cdpd-master-core-s3/repos/Impala/tests/stress/test_acid_stress.py:189:
>  in test_read_hive_inserts
> self._run_test_read_hive_inserts(unique_database, is_partitioned)
> /data/jenkins/workspace/impala-cdpd-master-core-s3/repos/Impala/tests/stress/test_acid_stress.py:167:
>  in _run_test_read_hive_inserts
> sleep_seconds=3)])
> /data/jenkins/workspace/impala-cdpd-master-core-s3/repos/Impala/tests/stress/test_acid_stress.py:51:
>  in run_tasks
> pool.map_async(Task.run, tasks).get(600)
> /usr/lib64/python2.7/multiprocessing/pool.py:550: in get
> raise TimeoutError
> E   TimeoutError{noformat}
> This looks like it was added in IMPALA-8648 
> ([https://github.com/apache/impala/commit/7ccfc43d963e4c36d065f1d912b52cf983e0595d])
> I have only seen this on s3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)