[Impala-CR](cdh5-2.6.0 5.8.0) PREVIEW: IMPALA-4223: Handle truncated file read from HDFS cache

2016-11-02 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: PREVIEW: IMPALA-4223: Handle truncated file read from HDFS cache
..


Patch Set 1:

This should probably go in ASF's master branch, then get cherry-picked.

-- 
To view, visit http://gerrit.cloudera.org:8080/4645
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id1e1fdb0211819c5938956abb13b512350a46f1a
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-2.6.0_5.8.0
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: No


Re: Ideas for organizing 3.0

2016-11-02 Thread Jim Apple
Does anyone else have any thoughts, ideas, or concerns?

On Fri, Oct 28, 2016 at 10:04 AM, Jim Apple  wrote:
> I like this idea a lot.
>
> On Fri, Oct 28, 2016 at 10:01 AM, Tim Armstrong  
> wrote:
>> I think we should also have a period leading up to the branching where we
>> can add incompatible changes guarded by flags. I think otherwise it will be
>> a headache trying to stage things (realistically some
>> compatibility-breaking changes will be ready early and we don't want to
>> have them sitting off to the side bit-rotting).
>>
>> One case is getting Impala to work against the latest Hadoop/Hive/HBase
>> APIs - there were incompatible changes but it would be great to have master
>> buildable against both versions.
>>
>> On Fri, Oct 28, 2016 at 9:51 AM, Jim Apple  wrote:
>>
>>> The most recent release was 2.7.0. We have 32 issues that we might
>>> want to tackle for 3.0:
>>>
>>> https://issues.cloudera.org/issues/?filter=11830
>>>
>>> Does anyone have any thoughts about how to organize this? For instance
>>> we might decide:
>>>
>>> 0. Starting immediately, the community is encouraged to submit issues
>>> that would break compatibility. Detailed designs are also encouraged.
>>>
>>> 1. After 2.9.0, commits that break compatibility will be allowed in
>>> the "master" branch.
>>>
>>> 2. After 2.9.0 a call will go out for anyone who wants to get a
>>> compatibility-breaking patch in that they have 3 months to do so.
>>>
>>> 3. After three months, we'll cut a new release candidate and bump all
>>> JIRA issues that would break compatibility to Target Version: Impala
>>> 4.0
>>>
>>> Thoughts?
>>>


[Toolchain-CR] IMPALA-3399: Add DITA Open Toolkit to build Impala user docs.

2016-11-02 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-3399: Add DITA Open Toolkit to build Impala user docs.
..


Patch Set 1:

> My feeling is that we need to clearly distinguish between the
 > native-toolchain as a system for building native dependencies from
 > source in a self-contained reproducible way, and the S3 buckets as
 > a delivery mechanism.

I don't understand how this patch conflates them.

 > It's just
 > coincidence that one was available on the systems that it was built
 > on because those VM images are prepopulated with various JDKs.

Agreed. And putting a JVM in them is a rabbit hole I'd rather not go down.

 > We need to have a JDK for the Impala build (which is not the system
 > JDK)

Not the system JDK? Why not?

 > , so I think that actually is an argument to build it in the
 > Impala repository, which is the only step of the build now that
 > requires a JDK to be present.

Is "it" dita-ot in this sentence?

 > I also think it would be a mistake to start treating the
 > impala-setup repository as an essential build step instead of a
 > convenience.

I don't think adding dita-ot to impala-setup treats impala-setup as an 
essential. Anyone who already had dita-ot installed could presumably continue 
to use it.

 > What's the vision for the docs build? Will it be part of the Impala
 > source repoo?

Yes, the docs will be part of the Impala source repo.

-- 
To view, visit http://gerrit.cloudera.org:8080/4902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1110b8c8dc5a9333143055afd49734fc336a1f0
Gerrit-PatchSet: 1
Gerrit-Project: Toolchain
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Toolchain-CR] IMPALA-3399: Add DITA Open Toolkit to build Impala user docs.

2016-11-02 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3399: Add DITA Open Toolkit to build Impala user docs.
..


Patch Set 1:

My feeling is that we need to clearly distinguish between the native-toolchain 
as a system for building native dependencies from source in a self-contained 
reproducible way, and the S3 buckets as a delivery mechanism. I don't think 
they're the same thing and I think conflating them will cause problems.

native-toolchain should not require any Java compilers. It's just coincidence 
that one was available on the systems that it was built on because those VM 
images are prepopulated with various JDKs.

We need to have a JDK for the Impala build (which is not the system JDK), so I 
think that actually is an argument to build it in the Impala repository, which 
is the only step of the build now that requires a JDK to be present.

I also think it would be a mistake to start treating the impala-setup 
repository as an essential build step instead of a convenience.

What's the vision for the docs build? Will it be part of the Impala source 
repoo?

-- 
To view, visit http://gerrit.cloudera.org:8080/4902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1110b8c8dc5a9333143055afd49734fc336a1f0
Gerrit-PatchSet: 1
Gerrit-Project: Toolchain
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


Re: Bootstrapping a impala dev failed on fresh installed Ubuntu 14.04

2016-11-02 Thread Jim Apple
https://issues.cloudera.org/browse/IMPALA-4421

On Wed, Nov 2, 2016 at 6:46 AM, Laszlo Gaal  wrote:
> Hi Amos,
>
> I've encountered this malformed directoy name as well. The quick workaround
> is to delete it and create the subdir with the correct name:
> 1. rm -rf  ${IMPALA_HOME}/tests/*RESUL*
> 2. mkdir ${IMPALA_HOME}/tests/results
>
> This will create the correct subdirectory under tests.
>
> Hope this helps,
>
> - LaszloG
>
> On Wed, Nov 2, 2016 at 2:02 PM, Amos Bird  wrote:
>
>>
>> Ah, re-login does the trick. Thanks for you help ;).
>>
>> However, the e2e test yells so many errors.
>>
>> 1) the name of the directory containing the error log is strange. It
>>  literaly looks like this:
>> tests/"${RESULTS_DIR}/TEST-impala-custom-cluster.log"
>>
>> 2) the commit I tested is 7fc31b534d4c5cb118c559e16556a6c1ae6ca7fc
>>
>> 3) when executing tests/run-tests.py, it gave:
>> -
>> Traceback (most recent call last):
>>   File "./tests/run-tests.py", line 94, in 
>> test_executor.run_tests(args)
>>   File "./tests/run-tests.py", line 63, in run_tests
>> exit_code = pytest.main(args)
>>   File "/home/amos/impala/infra/python/env/local/lib/python2.
>> 7/site-packages/_pytest/config.py", line 32, in main
>> config = _prepareconfig(args, plugins)
>>   File "/home/amos/impala/infra/python/env/local/lib/python2.
>> 7/site-packages/_pytest/config.py", line 78, in _prepareconfig
>> args = shlex.split(args)
>>   File "/usr/lib/python2.7/shlex.py", line 279, in split
>> return list(lex)
>>   File "/usr/lib/python2.7/shlex.py", line 269, in next
>> token = self.get_token()
>>   File "/usr/lib/python2.7/shlex.py", line 96, in get_token
>> raw = self.read_token()
>>   File "/usr/lib/python2.7/shlex.py", line 172, in read_token
>> raise ValueError, "No closing quotation"
>> ValueError: No closing quotation
>> -
>>
>> 4) when executing "MAX_PYTEST_FAILURES=12345678 ./bin/run-all-tests.sh",
>> be, fe tests are passed. e2e tests fail a lot. Log files are attached.
>>
>> I'm refering to this https://cwiki.apache.org/
>> confluence/display/IMPALA/How+to+load+and+run+Impala+tests
>>
>> regards,
>> Amos
>>
>>
>>
>> Lars Volker writes:
>>
>> > Yes, this is already committed to the impala-setup repo and I used it
>> > yesterday on a fresh Ubuntu 14.04 machine with success.
>> >
>> > Amos, after running impala-setup you will need to re-login to make sure
>> the
>> > changes made to the system limits are effective. You can check them by
>> > running "ulimit -n" in your shell.
>> >
>> > On Wed, Nov 2, 2016 at 5:48 AM, Jim Apple  wrote:
>> >
>> >> Isn't that already part of the script?
>> >>
>> >> https://github.com/awleblang/impala-setup/commit/
>> >> 56fa829c99e997585eb63fcd49cb65eb8357e679
>> >>
>> >> https://git-wip-us.apache.org/repos/asf?p=incubator-impala.
>> >> git;a=blob;f=bin/bootstrap_development.sh;h=
>> 8c4f742ae058f8017858d2a749e882
>> >> 4be58bd410;hb=HEAD#l68
>> >>
>> >> On Tue, Nov 1, 2016 at 9:44 PM, Dimitris Tsirogiannis
>> >>  wrote:
>> >> > Hi Amos,
>> >> >
>> >> > You need to increase your limits (/etc/security/limits.conf) for max
>> >> > number of open files (nofile). Use a pretty big number (e.g. 500K) for
>> >> > both soft and hard.
>> >> >
>> >> > Hope that helps.
>> >> >
>> >> > Dimitris
>> >> >
>> >> > On Tue, Nov 1, 2016 at 8:57 PM, Amos Bird  wrote:
>> >> >>
>> >> >> Hi there,
>> >> >>
>> >> >> After days of efforts to make impala's local tests work on my Centos
>> >> >> machine, I finally gave up and turns to Ubuntu. I followed this
>> simple
>> >> >> guide
>> >> >> https://cwiki.apache.org/confluence/display/IMPALA/
>> >> Bootstrapping+an+Impala+Development+Environment+From+Scratch
>> >> >> on a fresh installed Ubuntu 14.04. Unfortunately there are still
>> errors
>> >> >> in loading data phase. Here is the error log,
>> >> >>
>> >> >> 
>> >> -
>> >> >> Loading Kudu TPCH (logging to /home/amos/impala/logs/data_
>> loading/load-kudu-tpch.log)...
>> >> FAILED
>> >> >> 'load-data tpch core kudu/none/none force' failed. Tail of log:
>> >> >> distribute by hash (c_custkey) into 9 buckets stored as kudu
>> >> >>
>> >> >> (load-tpch-core-impala-generated-kudu-none-none.sql):
>> >> >>
>> >> >>
>> >> >> Executing HBase Command: hbase shell load-tpch-core-hbase-
>> >> generated.create
>> >> >> 16/11/02 01:07:58 INFO Configuration.deprecation: hadoop.native.lib
>> is
>> >> deprecated. Instead, use io.native.lib.available
>> >> >> SLF4J: Class path contains multiple SLF4J bindings.
>> >> >> SLF4J: Found binding in [jar:file:/home/amos/impala/
>> >> toolchain/cdh_components/hbase-1.2.0-cdh5.10.0-
>> >> SNAPSHOT/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/
>> >> StaticLoggerBinder.class]
>> >> >> SLF4J: Found binding in [jar:file:/home/amos/impala/
>> >> toolchain/cdh_components/hadoop-2.6.0-cdh5.10.0-
>> >> SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/
>> org/slf4j/impl/

[Toolchain-CR] IMPALA-3399: Add DITA Open Toolkit to build Impala user docs.

2016-11-02 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change.

Change subject: IMPALA-3399: Add DITA Open Toolkit to build Impala user docs.
..


Patch Set 1:

> I think we can avoid making this part of native-toolchain since it
 > looks like pure Java code that doesn't depend on any other parts of
 > the toolchain and isn't platform-dependent.

I was adding this to the toolchain not only for platform-dependency issues, but 
also to make it easy for toolchain users to get all of the tools needed for 
build and test.

Also, I imagine it might depend on your JDK, which one might consider part of 
the platform.

 > This would also make the native toolchain dependent on Gradle,
 > which I believe will download things from the internet, rather than
 > just our S3 source buckets.

Yeah, that's not great. Once we decide where to put this, I will try to fix 
that with "./gradlew dist".

 > How about we just add the build script to the Impala repository? I
 > could see hosting the source on S3 but it might also be good to
 > support building directly from the original github tag.

Maybe it should be part of https://github.com/awleblang/impala-setup? It seems 
like that's where the rest of the scripts to download and install third-party 
dependencies are, other than this Toolchain repo.

-- 
To view, visit http://gerrit.cloudera.org:8080/4902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic1110b8c8dc5a9333143055afd49734fc336a1f0
Gerrit-PatchSet: 1
Gerrit-Project: Toolchain
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


Re: Bootstrapping a impala dev failed on fresh installed Ubuntu 14.04

2016-11-02 Thread Jim Apple
> Amos, after running impala-setup you will need to re-login to make sure the
> changes made to the system limits are effective. You can check them by
> running "ulimit -n" in your shell.

Filed https://issues.cloudera.org/browse/IMPALA-4419 to improve the script.


Re: Bootstrapping a impala dev failed on fresh installed Ubuntu 14.04

2016-11-02 Thread Laszlo Gaal
Hi Amos,

I've encountered this malformed directoy name as well. The quick workaround
is to delete it and create the subdir with the correct name:
1. rm -rf  ${IMPALA_HOME}/tests/*RESUL*
2. mkdir ${IMPALA_HOME}/tests/results

This will create the correct subdirectory under tests.

Hope this helps,

- LaszloG

On Wed, Nov 2, 2016 at 2:02 PM, Amos Bird  wrote:

>
> Ah, re-login does the trick. Thanks for you help ;).
>
> However, the e2e test yells so many errors.
>
> 1) the name of the directory containing the error log is strange. It
>  literaly looks like this:
> tests/"${RESULTS_DIR}/TEST-impala-custom-cluster.log"
>
> 2) the commit I tested is 7fc31b534d4c5cb118c559e16556a6c1ae6ca7fc
>
> 3) when executing tests/run-tests.py, it gave:
> -
> Traceback (most recent call last):
>   File "./tests/run-tests.py", line 94, in 
> test_executor.run_tests(args)
>   File "./tests/run-tests.py", line 63, in run_tests
> exit_code = pytest.main(args)
>   File "/home/amos/impala/infra/python/env/local/lib/python2.
> 7/site-packages/_pytest/config.py", line 32, in main
> config = _prepareconfig(args, plugins)
>   File "/home/amos/impala/infra/python/env/local/lib/python2.
> 7/site-packages/_pytest/config.py", line 78, in _prepareconfig
> args = shlex.split(args)
>   File "/usr/lib/python2.7/shlex.py", line 279, in split
> return list(lex)
>   File "/usr/lib/python2.7/shlex.py", line 269, in next
> token = self.get_token()
>   File "/usr/lib/python2.7/shlex.py", line 96, in get_token
> raw = self.read_token()
>   File "/usr/lib/python2.7/shlex.py", line 172, in read_token
> raise ValueError, "No closing quotation"
> ValueError: No closing quotation
> -
>
> 4) when executing "MAX_PYTEST_FAILURES=12345678 ./bin/run-all-tests.sh",
> be, fe tests are passed. e2e tests fail a lot. Log files are attached.
>
> I'm refering to this https://cwiki.apache.org/
> confluence/display/IMPALA/How+to+load+and+run+Impala+tests
>
> regards,
> Amos
>
>
>
> Lars Volker writes:
>
> > Yes, this is already committed to the impala-setup repo and I used it
> > yesterday on a fresh Ubuntu 14.04 machine with success.
> >
> > Amos, after running impala-setup you will need to re-login to make sure
> the
> > changes made to the system limits are effective. You can check them by
> > running "ulimit -n" in your shell.
> >
> > On Wed, Nov 2, 2016 at 5:48 AM, Jim Apple  wrote:
> >
> >> Isn't that already part of the script?
> >>
> >> https://github.com/awleblang/impala-setup/commit/
> >> 56fa829c99e997585eb63fcd49cb65eb8357e679
> >>
> >> https://git-wip-us.apache.org/repos/asf?p=incubator-impala.
> >> git;a=blob;f=bin/bootstrap_development.sh;h=
> 8c4f742ae058f8017858d2a749e882
> >> 4be58bd410;hb=HEAD#l68
> >>
> >> On Tue, Nov 1, 2016 at 9:44 PM, Dimitris Tsirogiannis
> >>  wrote:
> >> > Hi Amos,
> >> >
> >> > You need to increase your limits (/etc/security/limits.conf) for max
> >> > number of open files (nofile). Use a pretty big number (e.g. 500K) for
> >> > both soft and hard.
> >> >
> >> > Hope that helps.
> >> >
> >> > Dimitris
> >> >
> >> > On Tue, Nov 1, 2016 at 8:57 PM, Amos Bird  wrote:
> >> >>
> >> >> Hi there,
> >> >>
> >> >> After days of efforts to make impala's local tests work on my Centos
> >> >> machine, I finally gave up and turns to Ubuntu. I followed this
> simple
> >> >> guide
> >> >> https://cwiki.apache.org/confluence/display/IMPALA/
> >> Bootstrapping+an+Impala+Development+Environment+From+Scratch
> >> >> on a fresh installed Ubuntu 14.04. Unfortunately there are still
> errors
> >> >> in loading data phase. Here is the error log,
> >> >>
> >> >> 
> >> -
> >> >> Loading Kudu TPCH (logging to /home/amos/impala/logs/data_
> loading/load-kudu-tpch.log)...
> >> FAILED
> >> >> 'load-data tpch core kudu/none/none force' failed. Tail of log:
> >> >> distribute by hash (c_custkey) into 9 buckets stored as kudu
> >> >>
> >> >> (load-tpch-core-impala-generated-kudu-none-none.sql):
> >> >>
> >> >>
> >> >> Executing HBase Command: hbase shell load-tpch-core-hbase-
> >> generated.create
> >> >> 16/11/02 01:07:58 INFO Configuration.deprecation: hadoop.native.lib
> is
> >> deprecated. Instead, use io.native.lib.available
> >> >> SLF4J: Class path contains multiple SLF4J bindings.
> >> >> SLF4J: Found binding in [jar:file:/home/amos/impala/
> >> toolchain/cdh_components/hbase-1.2.0-cdh5.10.0-
> >> SNAPSHOT/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/
> >> StaticLoggerBinder.class]
> >> >> SLF4J: Found binding in [jar:file:/home/amos/impala/
> >> toolchain/cdh_components/hadoop-2.6.0-cdh5.10.0-
> >> SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/
> org/slf4j/impl/
> >> StaticLoggerBinder.class]
> >> >> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> >> explanation.
> >> >> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> >> >> Executing HBase Command: hbase sh