ColumnLineageGraph.java Compile Error in Frontend

2017-10-03 Thread
Hi all,

I encountered a compile error when I try to recompile impala yesterday. The
error is in Frontend:

[ERROR] COMPILATION ERROR :

[INFO] -

[ERROR]
/mnt/volume1/impala-orc/incubator-impala/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java:[593,11]
no suitable method found for putString(java.lang.String)

method
com.google.common.hash.Hasher.putString(java.lang.CharSequence,java.nio.charset.Charset)
is not applicable

  (actual and formal argument lists differ in length)

method
com.google.common.hash.PrimitiveSink.putString(java.lang.CharSequence,java.nio.charset.Charset)
is not applicable

  (actual and formal argument lists differ in length)


I also found this in the jenkins builds. It seems that
com.google.common.hash.Hasher exists in both guava-*.jar and
hive-exec-*.jar. Are there any changes in
hive-exec-1.1.0-cdh5.14.0-SNAPSHOT.jar recently? What can I do to recover
from this?


Thanks,

Quanlong


Question about the multi-thread scan node model

2017-08-30 Thread
Hi all,


I’m working on applying our orc-support patch into the latest code bases (
IMPALA-5717 ). Since our
patch is based on cdh-5.7.3-release which was released one year ago,
there’re lots of work to merge it.


One of the biggest changes from cdh-5.7.3-release I notice is the new scan
node & scanner model introduced in IMPALA-3902
. I think it’s inspired
by the investigating task in IMPALA-2849
, but I cannot find any
performance report in this issue. Could you share some report about this
multi-thread refactor?


I’m wondering how much this can improve the performance, since the old
single thread scan node & multi-thread scanners model has supplied
concurrent IO for reading, and most of the queries in OLAP are IO bound.


Thanks,

Quanlong


Re:Re:Re: Re: Impala hadoop variable

2017-07-25 Thread
Hi sky, 


Do you want to use customized hadoop cluster but not the mini cluster? For 
example, testing the latest version of Impala upon your existing Hive cluster.
If so, you can modify the configuration files in ./fe/src/test/resources. 
They're just symbolic links. You can link them to your hadoop configuration 
files. Then the impala cluster will use your hadoop cluster.

At 2017-07-18 18:14:32, "sky"  wrote:
> After I changed the impala-config.sh file, the HADOOP_CONF_DIR variables 
> did not take effect. This is still the case after running this 
> file(./bin/create-test-configuration.sh).
>It seems that this variable does not take effect, because I only put hadoop 
>configuration file into the ./bin/start-impala -cluster.sh file of the same 
>directory will recognize the configuration. So I want to know how to do so 
>that any startup is not necessarily in the hadoop configure path.
>
>
>
>
>
>
>
>At 2017-07-18 02:33:11, "Tim Armstrong"  wrote:
>>I'm not sure that I fully understand the question.
>>
>>There isn't a way to override HADOOP_CONF_DIR mostly  - most scripts source
>>impala-config.sh.
>>
>>On Sun, Jul 16, 2017 at 8:31 PM, sky  wrote:
>>
>>> Hi Tim,
>>> I found it from ./bin/create-test-configuration.sh that generating
>>> ./fe/src/test/resources configurations, and HADOOP_CONFIG_DIR variable also
>>> points to this directory. But I change this variable is not take effect. Is
>>> this a hard code?


Re:Re: Re: Failed to load test data about TPC-H

2017-06-01 Thread
OK. Thanks!




Quanlong

在 2017-06-02 00:21:40,"Tim Armstrong"  写道:

We don't test with mixed versions like that unfortunately.



On Thu, Jun 1, 2017 at 8:02 AM, 黄权隆  wrote:

Hi Tim,


Thanks for you reply! I'll try these scripts later. One more question.
Is the latest Impala compatible with components in CDH-5.7.3? 
For example, Hadoop-2.6.0 and Hive-1.1.0?


We use the old version cdh-5.7.3-release just due to the concern
of incompatibility.


Thanks


Quanlong



At 2017-06-01 21:31:17, "Tim Armstrong"  wrote:
>Hi Quanlong,
>  It looks like you're missing the TPC-H data. In older versions of Impala
>you had to generate the data manually and put it in that directory. We've
>automated that in more recent versions (I think probably since a year ago).
>If you can switch to a newer version, then this will just work. Data
>loading is a lot more reliable now.
>
>Otherwise this is the script that generates the data. You can probably copy
>this script to your repository and run it by hand:
>
>https://github.com/apache/incubator-impala/blob/master/testdata/datasets/tpch/preload
>
>You will also need to do the same for TPC-DS:
>https://github.com/apache/incubator-impala/blob/master/testdata/datasets/tpcds/preload
>
>
>Cheers,
>Tim
>
>On Thu, Jun 1, 2017 at 12:54 AM, 黄权隆  wrote:
>
>> Hi friends,
>>
>>
>> I'm trying to run the impala tests. What I referred is the wiki 'How to
>> load and run Impala tests'.
>> Although I just want to run some end-to-end tests, I know I should load
>> the test data first. So I use
>> |
>> ./buildall.sh -noclean -testdata
>> |
>> It succeeded to load the functional test data, but failed to load the tpch
>> data set. Here are some related logs:
>>
>>
>> /home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-
>> release/testdata/target
>> SUCCESS, data generated into /home/CORP/quanlong.huang/
>> workspace/Impala-cdh5.7.3-release/testdata/target
>> Loading Hive Builtins (logging to load-hive-builtins.log)... OK
>> Generating HBase data (logging to create-hbase.log)... OK
>> Creating /test-warehouse HDFS directory (logging to
>> create-test-warehouse-dir.log)... OK
>> Starting Impala cluster (logging to start-impala-cluster.log)... OK
>> Setting up HDFS environment (logging to setup-hdfs-env.log)... OK
>> Loading custom schemas (logging to load-custom-schemas.log)... OK
>> Loading functional-query data (logging to load-functional-query.log)... OK
>> Loading TPC-H data (logging to load-tpch.log)... FAILED
>> 'load-data tpch core' failed. Tail of log:
>> Log for command 'load-data tpch core'
>> Loading workload 'tpch' Using exploration strategy 'core'. Logging to
>> /home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-
>> release/cluster_logs/data_loading/data-load-tpch-core.log
>> Error loading data. The end of the log file is:
>> at org.apache.thrift.ProcessFunction.process(
>> ProcessFunction.java:39)
>> at org.apache.thrift.TBaseProcessor.process(
>> TBaseProcessor.java:39)
>> at org.apache.hive.service.auth.TSetIpAddressProcessor.process(
>> TSetIpAddressProcessor.java:56)
>> at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(
>> TThreadPoolServer.java:285)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(
>> ThreadPoolExecutor.java:1145)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
>> ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>> Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:23
>> Invalid path ''/home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-
>> release/testdata/impala-data/tpch/lineitem'': No files matching path
>> file:/home/CORP/quanlong.huang/workspace/Impala-cdh5.7.
>> 3-release/testdata/impala-data/tpch/lineitem
>> at org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.
>> applyConstraints(LoadSemanticAnalyzer.java:139)
>> at org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.
>> analyzeInternal(LoadSemanticAnalyzer.java:230)
>> at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.
>> analyze(BaseSemanticAnalyzer.java:222)
>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:445)
>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311)
>> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.
>> java:1189)
>> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(
>> Driver.java:1176)
>>

Re:Re: Failed to load test data about TPC-H

2017-06-01 Thread
Hi Tim,


Thanks for you reply! I'll try these scripts later. One more question.
Is the latest Impala compatible with components in CDH-5.7.3? 
For example, Hadoop-2.6.0 and Hive-1.1.0?


We use the old version cdh-5.7.3-release just due to the concern
of incompatibility.


Thanks


Quanlong



At 2017-06-01 21:31:17, "Tim Armstrong"  wrote:
>Hi Quanlong,
>  It looks like you're missing the TPC-H data. In older versions of Impala
>you had to generate the data manually and put it in that directory. We've
>automated that in more recent versions (I think probably since a year ago).
>If you can switch to a newer version, then this will just work. Data
>loading is a lot more reliable now.
>
>Otherwise this is the script that generates the data. You can probably copy
>this script to your repository and run it by hand:
>
>https://github.com/apache/incubator-impala/blob/master/testdata/datasets/tpch/preload
>
>You will also need to do the same for TPC-DS:
>https://github.com/apache/incubator-impala/blob/master/testdata/datasets/tpcds/preload
>
>
>Cheers,
>Tim
>
>On Thu, Jun 1, 2017 at 12:54 AM, 黄权隆  wrote:
>
>> Hi friends,
>>
>>
>> I'm trying to run the impala tests. What I referred is the wiki 'How to
>> load and run Impala tests'.
>> Although I just want to run some end-to-end tests, I know I should load
>> the test data first. So I use
>> |
>> ./buildall.sh -noclean -testdata
>> |
>> It succeeded to load the functional test data, but failed to load the tpch
>> data set. Here are some related logs:
>>
>>
>> /home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-
>> release/testdata/target
>> SUCCESS, data generated into /home/CORP/quanlong.huang/
>> workspace/Impala-cdh5.7.3-release/testdata/target
>> Loading Hive Builtins (logging to load-hive-builtins.log)... OK
>> Generating HBase data (logging to create-hbase.log)... OK
>> Creating /test-warehouse HDFS directory (logging to
>> create-test-warehouse-dir.log)... OK
>> Starting Impala cluster (logging to start-impala-cluster.log)... OK
>> Setting up HDFS environment (logging to setup-hdfs-env.log)... OK
>> Loading custom schemas (logging to load-custom-schemas.log)... OK
>> Loading functional-query data (logging to load-functional-query.log)... OK
>> Loading TPC-H data (logging to load-tpch.log)... FAILED
>> 'load-data tpch core' failed. Tail of log:
>> Log for command 'load-data tpch core'
>> Loading workload 'tpch' Using exploration strategy 'core'. Logging to
>> /home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-
>> release/cluster_logs/data_loading/data-load-tpch-core.log
>> Error loading data. The end of the log file is:
>> at org.apache.thrift.ProcessFunction.process(
>> ProcessFunction.java:39)
>> at org.apache.thrift.TBaseProcessor.process(
>> TBaseProcessor.java:39)
>> at org.apache.hive.service.auth.TSetIpAddressProcessor.process(
>> TSetIpAddressProcessor.java:56)
>> at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(
>> TThreadPoolServer.java:285)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(
>> ThreadPoolExecutor.java:1145)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
>> ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>> Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:23
>> Invalid path ''/home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-
>> release/testdata/impala-data/tpch/lineitem'': No files matching path
>> file:/home/CORP/quanlong.huang/workspace/Impala-cdh5.7.
>> 3-release/testdata/impala-data/tpch/lineitem
>> at org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.
>> applyConstraints(LoadSemanticAnalyzer.java:139)
>> at org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.
>> analyzeInternal(LoadSemanticAnalyzer.java:230)
>> at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.
>> analyze(BaseSemanticAnalyzer.java:222)
>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:445)
>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311)
>> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.
>> java:1189)
>> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(
>> Driver.java:1176)
>> at org.apache.hive.service.cli.operation.SQLOperation.
>> prepare(SQLOperation.java:134)
>> ... 26 more
>>
>>
>> Closing: 0: jdbc:hive2://localhost:11050/de

Failed to load test data about TPC-H

2017-06-01 Thread
Hi friends,


I'm trying to run the impala tests. What I referred is the wiki 'How to load 
and run Impala tests'. 
Although I just want to run some end-to-end tests, I know I should load the 
test data first. So I use
|
./buildall.sh -noclean -testdata
|
It succeeded to load the functional test data, but failed to load the tpch data 
set. Here are some related logs:


/home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-release/testdata/target
SUCCESS, data generated into 
/home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-release/testdata/target
Loading Hive Builtins (logging to load-hive-builtins.log)... OK
Generating HBase data (logging to create-hbase.log)... OK
Creating /test-warehouse HDFS directory (logging to 
create-test-warehouse-dir.log)... OK
Starting Impala cluster (logging to start-impala-cluster.log)... OK
Setting up HDFS environment (logging to setup-hdfs-env.log)... OK
Loading custom schemas (logging to load-custom-schemas.log)... OK
Loading functional-query data (logging to load-functional-query.log)... OK
Loading TPC-H data (logging to load-tpch.log)... FAILED
'load-data tpch core' failed. Tail of log:
Log for command 'load-data tpch core'
Loading workload 'tpch' Using exploration strategy 'core'. Logging to 
/home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-release/cluster_logs/data_loading/data-load-tpch-core.log
Error loading data. The end of the log file is:
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:23 Invalid 
path 
''/home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-release/testdata/impala-data/tpch/lineitem'':
 No files matching path 
file:/home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-release/testdata/impala-data/tpch/lineitem
at 
org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.applyConstraints(LoadSemanticAnalyzer.java:139)
at 
org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.analyzeInternal(LoadSemanticAnalyzer.java:230)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:445)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1189)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1176)
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:134)
... 26 more


Closing: 0: jdbc:hive2://localhost:11050/default;auth=none
Error executing file from Hive: load-tpch-core-hive-generated.sql
Error in 
/home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-release/testdata/bin/create-load-data.sh
 at line 41: while [ -n "$*" ]
Error in ./buildall.sh at line 368: 
${IMPALA_HOME}/testdata/bin/create-load-data.sh ${CREATE_LOAD_DATA_ARGS} <<< Y


I'm using version cdh5.7.3-release. The directory 
${IMPALA_HOME}/testdata/impala-data dose not exist.


Could you tell me how to generate this data set? Or where can I download the 
snapshot file of test-warehouse so I can skip this step?


Thanks

Quanlong



【网易自营|30天无忧退货】德国Birkenstock制造商“经典软木凉拖”限时仅69.9元>>



【网易自营|30天无忧退货】德国Birkenstock制造商“经典软木凉拖”限时仅69.9元>>



【网易自营|30天无忧退货】德国Birkenstock制造商“经典软木凉拖”限时仅69.9元>>