[Impala-CR](cdh5-trunk) IMPALA-3567: Part 1: groundwork to make Join build sides DataSinks
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3567: Part 1: groundwork to make Join build sides DataSinks .. Patch Set 16: I ended up simplifying the interface by removing DataAsink::GetNextRowBatch() and removing the JoinBuildSink interface entirely. TPC-H benchmark results were neutral, so it doesn't look like performance is negatively affected (I think skipping the timers probably compensates for the slight inefficiency of RowBatch::AcquireState()) I put the latest patchset through a battery of tests described in the commit message. -- To view, visit http://gerrit.cloudera.org:8080/3282 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9d7608181eeacfe706a09c1e153d0a3e1ee9b475 Gerrit-PatchSet: 16 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-CR](cdh5-trunk) IMPALA-3567: Part 1: groundwork to make Join build sides DataSinks
Tim Armstrong has uploaded a new patch set (#16). Change subject: IMPALA-3567: Part 1: groundwork to make Join build sides DataSinks .. IMPALA-3567: Part 1: groundwork to make Join build sides DataSinks Refactor DataSink interface to be more generic. We need more flexibility in setting up MemTrackers, so that memory is accounted against the right ExecNode. Also removes some redundancy between DataSink subclasses in setting up RuntimeProfiles, etc. Remove the redundancy in the DataSink between passing eos to GetNext() and FlushFinal(). This simplifies HdfsTableSink quite a bit and makes handling empty batches simpler. Partially refactor join nodes that so control flow between BlockingJoinNode::Open() and its subclasses is easier to follow. BlockingJoinNode now only calls one virtual function on its subclasses: ConstructBuildSide(). Once we convert all join nodes to use the DataSink interface, we will also be able to remove that as well. As a minor optimisation, avoid updating a timer that is ignored for non-async builds. As a proof of concept, this patch separates out the build side of NestedLoopJoinNode into a class that implements the DataSink interface. The main challenge here is that NestedLoopJoinNode recycles row batches to avoid reallocations and copies of row batches in subplans. The solution to this is: - Retain the special-case optimisation for SingularRowSrc - Use the row batch cache and RowBatch::AcquireState() to copy the state of row batches passed to Send(), while recycling the RowBatch objects. Refactoring the partitioned hash join is left for Part 2. Testing: Ran exhaustive, core ASAN, and exhaustive non-partioned agg/join builds. Also ran a local stress test. Performance: Ran TPC-H nested locally. The results show that the change is perf-neutral. +--+---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +--+---+-++++ | TPCH_NESTED(_20) | parquet / none / none | 7.75| +0.19% | 5.64 | -0.34% | +--+---+-++++ +--+--+---++-++---++-+---+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters | +--+--+---++-++---++-+---+ | TPCH_NESTED(_20) | TPCH-Q17 | parquet / none / none | 18.96 | 17.95 | +5.61% | 4.97% | 0.71%| 1 | 10| | TPCH_NESTED(_20) | TPCH-Q14 | parquet / none / none | 3.61 | 3.56| +1.25% | 0.97% | 1.19%| 1 | 10| | TPCH_NESTED(_20) | TPCH-Q8 | parquet / none / none | 6.25 | 6.23| +0.44% | 0.44% | 0.90%| 1 | 10| | TPCH_NESTED(_20) | TPCH-Q10 | parquet / none / none | 5.84 | 5.83| +0.30% | 1.21% | 1.82%| 1 | 10| | TPCH_NESTED(_20) | TPCH-Q5 | parquet / none / none | 4.91 | 4.90| +0.11% | 0.18% | 0.78%| 1 | 10| | TPCH_NESTED(_20) | TPCH-Q21 | parquet / none / none | 17.82 | 17.81 | +0.07% | 0.66% | 0.58%| 1 | 10| | TPCH_NESTED(_20) | TPCH-Q4 | parquet / none / none | 5.12 | 5.12| -0.02% | 0.97% | 1.23%| 1 | 10| | TPCH_NESTED(_20) | TPCH-Q9 | parquet / none / none | 23.85 | 23.88 | -0.15% | 0.72% | 0.22%| 1 | 10| | TPCH_NESTED(_20) | TPCH-Q12 | parquet / none / none | 6.15 | 6.16| -0.16% | 1.60% | 1.72%| 1 | 10| | TPCH_NESTED(_20) | TPCH-Q3 | parquet / none / none | 5.46 | 5.47| -0.23% | 1.28% | 0.90%| 1 | 10| | TPCH_NESTED(_20) | TPCH-Q16 | parquet / none / none | 3.61 | 3.62| -0.26% | 1.00% | 1.36%| 1 | 10| | TPCH_NESTED(_20) | TPCH-Q19 | parquet / none / none | 20.19 | 20.31 | -0.58% | 1.67% | 0.65%| 1 | 10| | TPCH_NESTED(_20) | TPCH-Q7 | parquet / none / none | 9.42 | 9.48| -0.68% | 0.87% | 0.71%| 1 | 10| | TPCH_NESTED(_20) | TPCH-Q18 | parquet / none / none | 12.94 | 13.06 | -0.90% | 0.59% | 0.48%| 1 | 10| | TPCH_NESTED(_20) | TPCH-Q22 | parquet / none / none | 1.09 | 1.10| -0.92% | 2.26% | 2.22%| 1 | 10| | TPCH_NESTED(_20) | TPCH-Q13 |
Re: HBase errors prevent run-all-tests.sh
One idea is to check your ulimit for file descriptors and run `lsof | grep wc -l` to see if you for some reason exceeded the limit. Otherwise, a fresh reboot might help to figure out if you somewhere have a spare process hogging FDs. On Sun, Jul 24, 2016 at 8:09 PM Jim Applewrote: > The NN and DN logs are empty. > > I bin/kill-all.sh at the beginning of this, so I assume that nothing > is taking them except for my little Impala work. > > On Sun, Jul 24, 2016 at 8:03 PM, Bharath Vissapragada > wrote: > > Based on > > > > 16/07/24 18:36:08 WARN hdfs.BlockReaderFactory: I/O error constructing > > remote block reader. > > java.net.SocketException: Too many open files > > > > 16/07/24 18:36:08 WARN hdfs.DFSClient: Failed to connect to > > /127.0.0.1:31000 for block, add to deadNodes and continue. > > java.net.SocketException: Too many open files > > > > I'm guessing your hdfs instance might be overloaded (check the NN/DN > logs). > > HMaster is unable to connect to NN while opening regions and hence > throwing > > the error. > > > > On Mon, Jul 25, 2016 at 8:05 AM, Jim Apple wrote: > > > >> Several thousand lines of things like > >> > >> WARN shortcircuit.ShortCircuitCache: ShortCircuitCache(0x419c7df4): > >> failed to load 1073764575_BP-1490185442-127.0.0.1-1456935654337 > >> > >> java.lang.NullPointerException at > >> > >> > org.apache.hadoop.hdfs.shortcircuit.ShortCircuitReplica.(ShortCircuitReplica.java:126) > >> ... > >> > >> 16/07/24 18:36:08 WARN hdfs.BlockReaderFactory: > >> > >> > BlockReaderFactory(fileName=/hbase/MasterProcWALs/state-3172.log, > >> block=BP-1490185442-127.0.0.1-1456935654337:blk_1073764629_23805): > >> error creating ShortCircuitReplica. > >> > >> java.io.EOFException: unexpected EOF while reading metadata file header > >> > >> 16/07/24 18:36:08 WARN hdfs.BlockReaderFactory: I/O error constructing > >> remote block reader. > >> java.net.SocketException: Too many open files > >> > >> 16/07/24 18:36:08 WARN hdfs.DFSClient: Failed to connect to > >> /127.0.0.1:31000 for block, add to deadNodes and continue. > >> java.net.SocketException: Too many open files > >> > >> 16/07/24 18:36:08 INFO hdfs.DFSClient: Could not obtain > >> BP-1490185442-127.0.0.1-1456935654337:blk_1073764629_23805 from any > >> node: java.io.IOException: No live nodes contain block > >> BP-1490185442-127.0.0.1-1456935654337:blk_1073764629_23805 a > >> fter checking nodes = > >> [DatanodeInfoWithStorage[127.0.0.1:31000 > >> ,DS-0232508a-5512-4827-bcaf-c922f1e65eb1,DISK]], > >> ignoredNodes = null No live nodes contain current block Block > >> locations: DatanodeInfoWithStorage[127.0.0.1:31000,DS-0232508a-551 > >> 2-4827-bcaf-c922f1e65eb1,DISK] Dead nodes: > >> DatanodeInfoWithStorage[127.0.0.1:31000 > >> ,DS-0232508a-5512-4827-bcaf-c922f1e65eb1,DISK]. > >> Will get new block locations from namenode and retry... > >> 16/07/24 18:36:08 WARN hdfs.DFSClient: DFS chooseDataNode: got # 1 > >> IOException, will wait for 2772.7114628272548 msec. > >> 16/07/24 18:36:11 WARN hdfs.BlockReaderFactory: > >> > >> > BlockReaderFactory(fileName=/hbase/MasterProcWALs/state-3172.log, > >> block=BP-1490185442-127.0.0.1-1456935654337:blk_1073764629_23805): > >> error creating ShortCircuitReplica. > >> java.io.IOException: Illegal seek > >> at sun.nio.ch.FileDispatcherImpl.pread0(Native Method) > >> > >> On Sun, Jul 24, 2016 at 7:24 PM, Bharath Vissapragada > >> wrote: > >> > Do you see something in the HMaster log? From the error it looks like > the > >> > Hbase master hasn't started properly for some reason. > >> > > >> > On Mon, Jul 25, 2016 at 6:08 AM, Jim Apple > wrote: > >> > > >> >> I tried reloading the data with > >> >> > >> >> ./bin/load-data.py --workloads functional-query > >> >> > >> >> but that gave errors like > >> >> > >> >> Executing HBase Command: hbase shell > >> >> load-functional-query-core-hbase-generated.create > >> >> 16/07/24 17:19:39 INFO Configuration.deprecation: hadoop.native.lib > is > >> >> deprecated. Instead, use io.native.lib.available > >> >> SLF4J: Class path contains multiple SLF4J bindings. > >> >> SLF4J: Found binding in > >> >> > >> >> > >> > [jar:file:/opt/Impala-Toolchain/cdh_components/hbase-1.2.0-cdh5.9.0-SNAPSHOT/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] > >> >> SLF4J: Found binding in > >> >> > >> >> > >> > [jar:file:/opt/Impala-Toolchain/cdh_components/hadoop-2.6.0-cdh5.9.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] > >> >> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > >> >> explanation. > >> >> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > >> >> > >> >> ERROR: Can't get the locations > >> >> > >> >> Here is some help for this command: > >> >> Start disable of named table: > >> >> hbase> disable 't1'
[Impala-CR](cdh5-trunk) IMPALA-2700: ASCII NUL characters are doubled on insert into text tables
anujphadke has uploaded a new patch set (#2). Change subject: IMPALA-2700: ASCII NUL characters are doubled on insert into text tables .. IMPALA-2700: ASCII NUL characters are doubled on insert into text tables Currently the scanner processes the '\0' character as a no special character whereas the writer treats it as a special character. The writer appends a special character before writting which is causing the ASCII NULL characters to double since they are the default escape characters. This adds a check to treat '\0' as a no special character in the writter. Change-Id: Ia30fa314d1ee1e99f9e7598466eb1570ca7940fc --- M be/src/exec/hdfs-text-table-writer.cc 1 file changed, 7 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/03/3703/2 -- To view, visit http://gerrit.cloudera.org:8080/3703 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia30fa314d1ee1e99f9e7598466eb1570ca7940fc Gerrit-PatchSet: 2 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: anujphadkeGerrit-Reviewer: Tim Armstrong
Re: HBase errors prevent run-all-tests.sh
The NN and DN logs are empty. I bin/kill-all.sh at the beginning of this, so I assume that nothing is taking them except for my little Impala work. On Sun, Jul 24, 2016 at 8:03 PM, Bharath Vissapragadawrote: > Based on > > 16/07/24 18:36:08 WARN hdfs.BlockReaderFactory: I/O error constructing > remote block reader. > java.net.SocketException: Too many open files > > 16/07/24 18:36:08 WARN hdfs.DFSClient: Failed to connect to > /127.0.0.1:31000 for block, add to deadNodes and continue. > java.net.SocketException: Too many open files > > I'm guessing your hdfs instance might be overloaded (check the NN/DN logs). > HMaster is unable to connect to NN while opening regions and hence throwing > the error. > > On Mon, Jul 25, 2016 at 8:05 AM, Jim Apple wrote: > >> Several thousand lines of things like >> >> WARN shortcircuit.ShortCircuitCache: ShortCircuitCache(0x419c7df4): >> failed to load 1073764575_BP-1490185442-127.0.0.1-1456935654337 >> >> java.lang.NullPointerException at >> >> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitReplica.(ShortCircuitReplica.java:126) >> ... >> >> 16/07/24 18:36:08 WARN hdfs.BlockReaderFactory: >> >> BlockReaderFactory(fileName=/hbase/MasterProcWALs/state-3172.log, >> block=BP-1490185442-127.0.0.1-1456935654337:blk_1073764629_23805): >> error creating ShortCircuitReplica. >> >> java.io.EOFException: unexpected EOF while reading metadata file header >> >> 16/07/24 18:36:08 WARN hdfs.BlockReaderFactory: I/O error constructing >> remote block reader. >> java.net.SocketException: Too many open files >> >> 16/07/24 18:36:08 WARN hdfs.DFSClient: Failed to connect to >> /127.0.0.1:31000 for block, add to deadNodes and continue. >> java.net.SocketException: Too many open files >> >> 16/07/24 18:36:08 INFO hdfs.DFSClient: Could not obtain >> BP-1490185442-127.0.0.1-1456935654337:blk_1073764629_23805 from any >> node: java.io.IOException: No live nodes contain block >> BP-1490185442-127.0.0.1-1456935654337:blk_1073764629_23805 a >> fter checking nodes = >> [DatanodeInfoWithStorage[127.0.0.1:31000 >> ,DS-0232508a-5512-4827-bcaf-c922f1e65eb1,DISK]], >> ignoredNodes = null No live nodes contain current block Block >> locations: DatanodeInfoWithStorage[127.0.0.1:31000,DS-0232508a-551 >> 2-4827-bcaf-c922f1e65eb1,DISK] Dead nodes: >> DatanodeInfoWithStorage[127.0.0.1:31000 >> ,DS-0232508a-5512-4827-bcaf-c922f1e65eb1,DISK]. >> Will get new block locations from namenode and retry... >> 16/07/24 18:36:08 WARN hdfs.DFSClient: DFS chooseDataNode: got # 1 >> IOException, will wait for 2772.7114628272548 msec. >> 16/07/24 18:36:11 WARN hdfs.BlockReaderFactory: >> >> BlockReaderFactory(fileName=/hbase/MasterProcWALs/state-3172.log, >> block=BP-1490185442-127.0.0.1-1456935654337:blk_1073764629_23805): >> error creating ShortCircuitReplica. >> java.io.IOException: Illegal seek >> at sun.nio.ch.FileDispatcherImpl.pread0(Native Method) >> >> On Sun, Jul 24, 2016 at 7:24 PM, Bharath Vissapragada >> wrote: >> > Do you see something in the HMaster log? From the error it looks like the >> > Hbase master hasn't started properly for some reason. >> > >> > On Mon, Jul 25, 2016 at 6:08 AM, Jim Apple wrote: >> > >> >> I tried reloading the data with >> >> >> >> ./bin/load-data.py --workloads functional-query >> >> >> >> but that gave errors like >> >> >> >> Executing HBase Command: hbase shell >> >> load-functional-query-core-hbase-generated.create >> >> 16/07/24 17:19:39 INFO Configuration.deprecation: hadoop.native.lib is >> >> deprecated. Instead, use io.native.lib.available >> >> SLF4J: Class path contains multiple SLF4J bindings. >> >> SLF4J: Found binding in >> >> >> >> >> [jar:file:/opt/Impala-Toolchain/cdh_components/hbase-1.2.0-cdh5.9.0-SNAPSHOT/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] >> >> SLF4J: Found binding in >> >> >> >> >> [jar:file:/opt/Impala-Toolchain/cdh_components/hadoop-2.6.0-cdh5.9.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] >> >> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an >> >> explanation. >> >> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] >> >> >> >> ERROR: Can't get the locations >> >> >> >> Here is some help for this command: >> >> Start disable of named table: >> >> hbase> disable 't1' >> >> hbase> disable 'ns1:t1' >> >> >> >> ERROR: Can't get master address from ZooKeeper; znode data == null >> >> >> >> On Sun, Jul 24, 2016 at 5:12 PM, Jim Apple >> wrote: >> >> > I'm having trouble with my HBase environment, and it's preventing me >> >> > from running bin/run-all-tests.sh. I am on Ubuntu 14.04. I have tried >> >> > this with a clean build, and I have tried unset LD_LIBRARY_PATH && >> >> > bin/impala-config.sh, and I have tried ./testdata/bin/run-all.sh >> >> > >> >> > Here is the error I
Re: HBase errors prevent run-all-tests.sh
Based on 16/07/24 18:36:08 WARN hdfs.BlockReaderFactory: I/O error constructing remote block reader. java.net.SocketException: Too many open files 16/07/24 18:36:08 WARN hdfs.DFSClient: Failed to connect to /127.0.0.1:31000 for block, add to deadNodes and continue. java.net.SocketException: Too many open files I'm guessing your hdfs instance might be overloaded (check the NN/DN logs). HMaster is unable to connect to NN while opening regions and hence throwing the error. On Mon, Jul 25, 2016 at 8:05 AM, Jim Applewrote: > Several thousand lines of things like > > WARN shortcircuit.ShortCircuitCache: ShortCircuitCache(0x419c7df4): > failed to load 1073764575_BP-1490185442-127.0.0.1-1456935654337 > > java.lang.NullPointerException at > > org.apache.hadoop.hdfs.shortcircuit.ShortCircuitReplica.(ShortCircuitReplica.java:126) > ... > > 16/07/24 18:36:08 WARN hdfs.BlockReaderFactory: > > BlockReaderFactory(fileName=/hbase/MasterProcWALs/state-3172.log, > block=BP-1490185442-127.0.0.1-1456935654337:blk_1073764629_23805): > error creating ShortCircuitReplica. > > java.io.EOFException: unexpected EOF while reading metadata file header > > 16/07/24 18:36:08 WARN hdfs.BlockReaderFactory: I/O error constructing > remote block reader. > java.net.SocketException: Too many open files > > 16/07/24 18:36:08 WARN hdfs.DFSClient: Failed to connect to > /127.0.0.1:31000 for block, add to deadNodes and continue. > java.net.SocketException: Too many open files > > 16/07/24 18:36:08 INFO hdfs.DFSClient: Could not obtain > BP-1490185442-127.0.0.1-1456935654337:blk_1073764629_23805 from any > node: java.io.IOException: No live nodes contain block > BP-1490185442-127.0.0.1-1456935654337:blk_1073764629_23805 a > fter checking nodes = > [DatanodeInfoWithStorage[127.0.0.1:31000 > ,DS-0232508a-5512-4827-bcaf-c922f1e65eb1,DISK]], > ignoredNodes = null No live nodes contain current block Block > locations: DatanodeInfoWithStorage[127.0.0.1:31000,DS-0232508a-551 > 2-4827-bcaf-c922f1e65eb1,DISK] Dead nodes: > DatanodeInfoWithStorage[127.0.0.1:31000 > ,DS-0232508a-5512-4827-bcaf-c922f1e65eb1,DISK]. > Will get new block locations from namenode and retry... > 16/07/24 18:36:08 WARN hdfs.DFSClient: DFS chooseDataNode: got # 1 > IOException, will wait for 2772.7114628272548 msec. > 16/07/24 18:36:11 WARN hdfs.BlockReaderFactory: > > BlockReaderFactory(fileName=/hbase/MasterProcWALs/state-3172.log, > block=BP-1490185442-127.0.0.1-1456935654337:blk_1073764629_23805): > error creating ShortCircuitReplica. > java.io.IOException: Illegal seek > at sun.nio.ch.FileDispatcherImpl.pread0(Native Method) > > On Sun, Jul 24, 2016 at 7:24 PM, Bharath Vissapragada > wrote: > > Do you see something in the HMaster log? From the error it looks like the > > Hbase master hasn't started properly for some reason. > > > > On Mon, Jul 25, 2016 at 6:08 AM, Jim Apple wrote: > > > >> I tried reloading the data with > >> > >> ./bin/load-data.py --workloads functional-query > >> > >> but that gave errors like > >> > >> Executing HBase Command: hbase shell > >> load-functional-query-core-hbase-generated.create > >> 16/07/24 17:19:39 INFO Configuration.deprecation: hadoop.native.lib is > >> deprecated. Instead, use io.native.lib.available > >> SLF4J: Class path contains multiple SLF4J bindings. > >> SLF4J: Found binding in > >> > >> > [jar:file:/opt/Impala-Toolchain/cdh_components/hbase-1.2.0-cdh5.9.0-SNAPSHOT/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] > >> SLF4J: Found binding in > >> > >> > [jar:file:/opt/Impala-Toolchain/cdh_components/hadoop-2.6.0-cdh5.9.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] > >> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > >> explanation. > >> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > >> > >> ERROR: Can't get the locations > >> > >> Here is some help for this command: > >> Start disable of named table: > >> hbase> disable 't1' > >> hbase> disable 'ns1:t1' > >> > >> ERROR: Can't get master address from ZooKeeper; znode data == null > >> > >> On Sun, Jul 24, 2016 at 5:12 PM, Jim Apple > wrote: > >> > I'm having trouble with my HBase environment, and it's preventing me > >> > from running bin/run-all-tests.sh. I am on Ubuntu 14.04. I have tried > >> > this with a clean build, and I have tried unset LD_LIBRARY_PATH && > >> > bin/impala-config.sh, and I have tried ./testdata/bin/run-all.sh > >> > > >> > Here is the error I get from compute stats: > >> > (./testdata/bin/compute-table-stats.sh) > >> > > >> > Executing: compute stats functional_hbase.alltypessmall > >> > -> Error: ImpalaBeeswaxException: > >> > Query aborted:RuntimeException: couldn't retrieve HBase table > >> > (functional_hbase.alltypessmall) info: > >> > Unable to find region for in
Re?? IMPALA-2428 Support multiple-character string as the field delimiter
Hello, Jim Apple, I have test original delimiters setting, and logs show's the difference of my commit and original setting, as below: My commit Original setting Field terminators can't be an empty string. All terminators can't be empty. (I will enhance restriction to this in my next patch) Tuple delimiter can't be the first byte of field delimiter Field delimiter and line delimiter can't be the same value(So these two restrictions are actually the same one) Escaped char can't be the first byte of field delimiterWarning: Escaped char will be ignored(I will relax my restriction to this in my next patch) No restriction for escaped char and line terminator Warning: Escaped char will be ignored(I will add this warning in my next patch) Terminator contains '\0' ImpalaRuntimeException(logs for detail. I add this restriction to fix this runtime exception.) Detail logs: Terminator is an empty string [nobida147:21000] > create table field_null(id int) row format delimited fields terminated by ""; Query: create table field_null(id int) row format delimited fields terminated by "" Query submitted at: 2016-07-25 10:20:41 (Coordinator: http://0.0.0.0:25000) ERROR: AnalysisException: ESCAPED BY values and LINE/FIELD terminators must be specified as a single character or as a decimal value in the range [-128:127]: [nobida147:21000] > create table line_null(id int) row format delimited lines terminated by ""; Query: create table line_null(id int) row format delimited lines terminated by "" Query submitted at: 2016-07-25 10:20:54 (Coordinator: http://0.0.0.0:25000) ERROR: AnalysisException: ESCAPED BY values and LINE/FIELD terminators must be specified as a single character or as a decimal value in the range [-128:127]: [nobida147:21000] > create table escape_null(id int) row format delimited escaped by ""; Query: create table escape_null(id int) row format delimited escaped by "" Query submitted at: 2016-07-25 10:21:13 (Coordinator: http://0.0.0.0:25000) ERROR: AnalysisException: ESCAPED BY values and LINE/FIELD terminators must be specified as a single character or as a decimal value in the range [-128:127]: Field delimiter and line delimiter have same value [nobida147:21000] > create table line_equal_field(id int) row format delimited fields terminated by "," lines terminated by ","; Query: create table line_equal_field(id int) row format delimited fields terminated by "," lines terminated by "," Query submitted at: 2016-07-25 10:23:45 (Coordinator: http://0.0.0.0:25000) ERROR: AnalysisException: Field delimiter and line delimiter have same value: byte 44 Field delimiter and escape character have same value [nobida147:21000] > create table escape_equal_field(id int) row format delimited fields terminated by "," escaped by ","; Query: create table escape_equal_field(id int) row format delimited fields terminated by "," escaped by "," Query submitted at: 2016-07-25 10:22:48 (Coordinator: http://0.0.0.0:25000) Query progress can be monitored at: http://0.0.0.0:25000/query_plan?query_id=924c6b616e183f62:7c4779a423b29d96 ++ || ++ ++ WARNINGS: Field delimiter and escape character have same value: byte 44. Escape character will be ignored Fetched 0 row(s) in 0.16s Line delimiter and escape character have same value [nobida147:21000] > create table escape_equal_line(id int) row format delimited escaped by "," lines terminated by ','; Query: create table escape_equal_line(id int) row format delimited escaped by "," lines terminated by ',' Query submitted at: 2016-07-25 10:23:21 (Coordinator: http://0.0.0.0:25000) Query progress can be monitored at: http://0.0.0.0:25000/query_plan?query_id=f443df31f58860bb:1c01f402050f35b3 ++ || ++ ++ WARNINGS: Line delimiter and escape character have same value: byte 44. Escape character will be ignored Fetched 0 row(s) in 0.13s Delimiter contains '\0' [nobida147:21000] > create table contains_zero(id int) row format delimited fields terminated by "\0"; Query: create table contains_zero(id int) row format delimited fields terminated by "\0" Query submitted at: 2016-07-25 10:08:39 (Coordinator: http://0.0.0.0:25000) ERROR: ImpalaRuntimeException: Error making 'createTable' RPC to Hive Metastore: CAUSED BY: MetaException: javax.jdo.JDODataStoreException: Put request failed : INSERT INTO "SERDE_PARAMS" ("PARAM_VALUE","SERDE_ID","PARAM_KEY") VALUES (?,?,?) at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:451) at org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:732) at org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:752) at org.apache.hadoop.hive.metastore.ObjectStore.createTable(ObjectStore.java:902) at
Re: HBase errors prevent run-all-tests.sh
Several thousand lines of things like WARN shortcircuit.ShortCircuitCache: ShortCircuitCache(0x419c7df4): failed to load 1073764575_BP-1490185442-127.0.0.1-1456935654337 java.lang.NullPointerException at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitReplica.(ShortCircuitReplica.java:126) ... 16/07/24 18:36:08 WARN hdfs.BlockReaderFactory: BlockReaderFactory(fileName=/hbase/MasterProcWALs/state-3172.log, block=BP-1490185442-127.0.0.1-1456935654337:blk_1073764629_23805): error creating ShortCircuitReplica. java.io.EOFException: unexpected EOF while reading metadata file header 16/07/24 18:36:08 WARN hdfs.BlockReaderFactory: I/O error constructing remote block reader. java.net.SocketException: Too many open files 16/07/24 18:36:08 WARN hdfs.DFSClient: Failed to connect to /127.0.0.1:31000 for block, add to deadNodes and continue. java.net.SocketException: Too many open files 16/07/24 18:36:08 INFO hdfs.DFSClient: Could not obtain BP-1490185442-127.0.0.1-1456935654337:blk_1073764629_23805 from any node: java.io.IOException: No live nodes contain block BP-1490185442-127.0.0.1-1456935654337:blk_1073764629_23805 a fter checking nodes = [DatanodeInfoWithStorage[127.0.0.1:31000,DS-0232508a-5512-4827-bcaf-c922f1e65eb1,DISK]], ignoredNodes = null No live nodes contain current block Block locations: DatanodeInfoWithStorage[127.0.0.1:31000,DS-0232508a-551 2-4827-bcaf-c922f1e65eb1,DISK] Dead nodes: DatanodeInfoWithStorage[127.0.0.1:31000,DS-0232508a-5512-4827-bcaf-c922f1e65eb1,DISK]. Will get new block locations from namenode and retry... 16/07/24 18:36:08 WARN hdfs.DFSClient: DFS chooseDataNode: got # 1 IOException, will wait for 2772.7114628272548 msec. 16/07/24 18:36:11 WARN hdfs.BlockReaderFactory: BlockReaderFactory(fileName=/hbase/MasterProcWALs/state-3172.log, block=BP-1490185442-127.0.0.1-1456935654337:blk_1073764629_23805): error creating ShortCircuitReplica. java.io.IOException: Illegal seek at sun.nio.ch.FileDispatcherImpl.pread0(Native Method) On Sun, Jul 24, 2016 at 7:24 PM, Bharath Vissapragadawrote: > Do you see something in the HMaster log? From the error it looks like the > Hbase master hasn't started properly for some reason. > > On Mon, Jul 25, 2016 at 6:08 AM, Jim Apple wrote: > >> I tried reloading the data with >> >> ./bin/load-data.py --workloads functional-query >> >> but that gave errors like >> >> Executing HBase Command: hbase shell >> load-functional-query-core-hbase-generated.create >> 16/07/24 17:19:39 INFO Configuration.deprecation: hadoop.native.lib is >> deprecated. Instead, use io.native.lib.available >> SLF4J: Class path contains multiple SLF4J bindings. >> SLF4J: Found binding in >> >> [jar:file:/opt/Impala-Toolchain/cdh_components/hbase-1.2.0-cdh5.9.0-SNAPSHOT/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] >> SLF4J: Found binding in >> >> [jar:file:/opt/Impala-Toolchain/cdh_components/hadoop-2.6.0-cdh5.9.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] >> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an >> explanation. >> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] >> >> ERROR: Can't get the locations >> >> Here is some help for this command: >> Start disable of named table: >> hbase> disable 't1' >> hbase> disable 'ns1:t1' >> >> ERROR: Can't get master address from ZooKeeper; znode data == null >> >> On Sun, Jul 24, 2016 at 5:12 PM, Jim Apple wrote: >> > I'm having trouble with my HBase environment, and it's preventing me >> > from running bin/run-all-tests.sh. I am on Ubuntu 14.04. I have tried >> > this with a clean build, and I have tried unset LD_LIBRARY_PATH && >> > bin/impala-config.sh, and I have tried ./testdata/bin/run-all.sh >> > >> > Here is the error I get from compute stats: >> > (./testdata/bin/compute-table-stats.sh) >> > >> > Executing: compute stats functional_hbase.alltypessmall >> > -> Error: ImpalaBeeswaxException: >> > Query aborted:RuntimeException: couldn't retrieve HBase table >> > (functional_hbase.alltypessmall) info: >> > Unable to find region for in functional_hbase.alltypessmall after 35 >> tries. >> > CAUSED BY: NoServerForRegionException: Unable to find region for in >> > functional_hbase.alltypessmall after 35 tries. >> > >> > Here is a snippet of the error in ./testdata/bin/split-hbase.sh >> > >> > Sun Jul 24 15:24:52 PDT 2016, >> > RpcRetryingCaller{globalStartTime=1469399003900, pause=100, >> > retries=31}, org.apache.hadoop.hbase.MasterNotRunningException: >> > com.google.protobuf.ServiceException: >> > >> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.ipc.ServerNotRunningYetException): >> > org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is >> > not running yet >> > >> > I tried ./bin/create_testdata.sh, but that exited almost immediately >> > with no
Re: HBase errors prevent run-all-tests.sh
Do you see something in the HMaster log? From the error it looks like the Hbase master hasn't started properly for some reason. On Mon, Jul 25, 2016 at 6:08 AM, Jim Applewrote: > I tried reloading the data with > > ./bin/load-data.py --workloads functional-query > > but that gave errors like > > Executing HBase Command: hbase shell > load-functional-query-core-hbase-generated.create > 16/07/24 17:19:39 INFO Configuration.deprecation: hadoop.native.lib is > deprecated. Instead, use io.native.lib.available > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > > [jar:file:/opt/Impala-Toolchain/cdh_components/hbase-1.2.0-cdh5.9.0-SNAPSHOT/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > > [jar:file:/opt/Impala-Toolchain/cdh_components/hadoop-2.6.0-cdh5.9.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > > ERROR: Can't get the locations > > Here is some help for this command: > Start disable of named table: > hbase> disable 't1' > hbase> disable 'ns1:t1' > > ERROR: Can't get master address from ZooKeeper; znode data == null > > On Sun, Jul 24, 2016 at 5:12 PM, Jim Apple wrote: > > I'm having trouble with my HBase environment, and it's preventing me > > from running bin/run-all-tests.sh. I am on Ubuntu 14.04. I have tried > > this with a clean build, and I have tried unset LD_LIBRARY_PATH && > > bin/impala-config.sh, and I have tried ./testdata/bin/run-all.sh > > > > Here is the error I get from compute stats: > > (./testdata/bin/compute-table-stats.sh) > > > > Executing: compute stats functional_hbase.alltypessmall > > -> Error: ImpalaBeeswaxException: > > Query aborted:RuntimeException: couldn't retrieve HBase table > > (functional_hbase.alltypessmall) info: > > Unable to find region for in functional_hbase.alltypessmall after 35 > tries. > > CAUSED BY: NoServerForRegionException: Unable to find region for in > > functional_hbase.alltypessmall after 35 tries. > > > > Here is a snippet of the error in ./testdata/bin/split-hbase.sh > > > > Sun Jul 24 15:24:52 PDT 2016, > > RpcRetryingCaller{globalStartTime=1469399003900, pause=100, > > retries=31}, org.apache.hadoop.hbase.MasterNotRunningException: > > com.google.protobuf.ServiceException: > > > org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.ipc.ServerNotRunningYetException): > > org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is > > not running yet > > > > I tried ./bin/create_testdata.sh, but that exited almost immediately > > with no error. > > > > Has anyone else seen and solved this before? > -- Thanks, Bharath
[Impala-CR](cdh5-trunk) IMPALA-2809: Improve ByteSwap with builtin function or SSSE3 or AVX2.
Youwei Wang has posted comments on this change. Change subject: IMPALA-2809: Improve ByteSwap with builtin function or SSSE3 or AVX2. .. Patch Set 34: (3 comments) http://gerrit.cloudera.org:8080/#/c/3081/32/be/src/benchmarks/bswap-benchmark.cc File be/src/benchmarks/bswap-benchmark.cc: Line 46: // AVX225.57X > Go ahead and put the wide result in, please. Done http://gerrit.cloudera.org:8080/#/c/3081/34/be/src/benchmarks/bswap-benchmark.cc File be/src/benchmarks/bswap-benchmark.cc: PS34, Line 42: Average > median Done http://gerrit.cloudera.org:8080/#/c/3081/34/be/src/util/bit-util.cc File be/src/util/bit-util.cc: Line 19: inline void ByteSwapScalarLoop(void* dst, const void* src, int len) { > Take this out of namespace impala - if it is only used in this files, put i Done -- To view, visit http://gerrit.cloudera.org:8080/3081 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I392ed5a8d5683f30f161282c228c1aedd7b648c1 Gerrit-PatchSet: 34 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Youwei WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Youwei Wang Gerrit-HasComments: Yes
[Impala-CR](cdh5-trunk) IMPALA-2809: Improve ByteSwap with builtin function or SSSE3 or AVX2.
Youwei Wang has uploaded a new patch set (#35). Change subject: IMPALA-2809: Improve ByteSwap with builtin function or SSSE3 or AVX2. .. IMPALA-2809: Improve ByteSwap with builtin function or SSSE3 or AVX2. Using SSSE3/AVX2 intrinsic to accelerate the function "static inline void ByteSwap(void* dst, const void* src, int len)" of BitUtil class, and a scalar byte-swap routine is added as fallback. Also the runtime selector for CPUs of different capacity is included, as well as performance test and data verification. Brief performance comparison is listed here: CPU: Intel(R) Core(TM) i5-4460 CPU@3.20GHz Result: I0725 20:47:02.402506 2078 bswap-benchmark.cc:117] Machine Info: Intel(R) Core(TM) i5-4460 CPU @ 3.20GHz ByteSwap benchmark:Function iters/ms 10%ile 50%ile 90%ile 10%ile 50%ile 90%ile (relative) (relative) (relative) - FastScalar675 725 731 1X 1X 1X SSSE3 6.12e+03 6.2e+03 6.23e+03 9.06X 8.55X 8.53X AVX2 1.87e+04 1.88e+04 1.89e+04 27.7X 25.9X 25.9X SIMD 1.82e+04 1.88e+04 1.89e+04 27X 25.9X 25.9X Change-Id: I392ed5a8d5683f30f161282c228c1aedd7b648c1 --- M be/src/benchmarks/CMakeLists.txt A be/src/benchmarks/bswap-benchmark.cc M be/src/exprs/string-functions-ir.cc M be/src/util/bit-util-test.cc M be/src/util/bit-util.cc M be/src/util/bit-util.h 6 files changed, 368 insertions(+), 52 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/81/3081/35 -- To view, visit http://gerrit.cloudera.org:8080/3081 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I392ed5a8d5683f30f161282c228c1aedd7b648c1 Gerrit-PatchSet: 35 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Youwei WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Youwei Wang
Re: HBase errors prevent run-all-tests.sh
I tried reloading the data with ./bin/load-data.py --workloads functional-query but that gave errors like Executing HBase Command: hbase shell load-functional-query-core-hbase-generated.create 16/07/24 17:19:39 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/Impala-Toolchain/cdh_components/hbase-1.2.0-cdh5.9.0-SNAPSHOT/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/Impala-Toolchain/cdh_components/hadoop-2.6.0-cdh5.9.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] ERROR: Can't get the locations Here is some help for this command: Start disable of named table: hbase> disable 't1' hbase> disable 'ns1:t1' ERROR: Can't get master address from ZooKeeper; znode data == null On Sun, Jul 24, 2016 at 5:12 PM, Jim Applewrote: > I'm having trouble with my HBase environment, and it's preventing me > from running bin/run-all-tests.sh. I am on Ubuntu 14.04. I have tried > this with a clean build, and I have tried unset LD_LIBRARY_PATH && > bin/impala-config.sh, and I have tried ./testdata/bin/run-all.sh > > Here is the error I get from compute stats: > (./testdata/bin/compute-table-stats.sh) > > Executing: compute stats functional_hbase.alltypessmall > -> Error: ImpalaBeeswaxException: > Query aborted:RuntimeException: couldn't retrieve HBase table > (functional_hbase.alltypessmall) info: > Unable to find region for in functional_hbase.alltypessmall after 35 tries. > CAUSED BY: NoServerForRegionException: Unable to find region for in > functional_hbase.alltypessmall after 35 tries. > > Here is a snippet of the error in ./testdata/bin/split-hbase.sh > > Sun Jul 24 15:24:52 PDT 2016, > RpcRetryingCaller{globalStartTime=1469399003900, pause=100, > retries=31}, org.apache.hadoop.hbase.MasterNotRunningException: > com.google.protobuf.ServiceException: > org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.ipc.ServerNotRunningYetException): > org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is > not running yet > > I tried ./bin/create_testdata.sh, but that exited almost immediately > with no error. > > Has anyone else seen and solved this before?
HBase errors prevent run-all-tests.sh
I'm having trouble with my HBase environment, and it's preventing me from running bin/run-all-tests.sh. I am on Ubuntu 14.04. I have tried this with a clean build, and I have tried unset LD_LIBRARY_PATH && bin/impala-config.sh, and I have tried ./testdata/bin/run-all.sh Here is the error I get from compute stats: (./testdata/bin/compute-table-stats.sh) Executing: compute stats functional_hbase.alltypessmall -> Error: ImpalaBeeswaxException: Query aborted:RuntimeException: couldn't retrieve HBase table (functional_hbase.alltypessmall) info: Unable to find region for in functional_hbase.alltypessmall after 35 tries. CAUSED BY: NoServerForRegionException: Unable to find region for in functional_hbase.alltypessmall after 35 tries. Here is a snippet of the error in ./testdata/bin/split-hbase.sh Sun Jul 24 15:24:52 PDT 2016, RpcRetryingCaller{globalStartTime=1469399003900, pause=100, retries=31}, org.apache.hadoop.hbase.MasterNotRunningException: com.google.protobuf.ServiceException: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.ipc.ServerNotRunningYetException): org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not running yet I tried ./bin/create_testdata.sh, but that exited almost immediately with no error. Has anyone else seen and solved this before?
[Impala-CR](cdh5-trunk) IMPALA-2979: Fix scheduling on remote hosts
Internal Jenkins has posted comments on this change. Change subject: IMPALA-2979: Fix scheduling on remote hosts .. Patch Set 32: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/2200 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I044f83806fcde820fcb38047cf6b8e780d803858 Gerrit-PatchSet: 32 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Lars VolkerGerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: anujphadke Gerrit-HasComments: No
[Impala-CR](cdh5-trunk) IMPALA-2979: Fix scheduling on remote hosts
Lars Volker has posted comments on this change. Change subject: IMPALA-2979: Fix scheduling on remote hosts .. Patch Set 31: Code-Review+2 +2 from Marcel. Fixed small errors related to tests running on a localhost multi-node cluster and running within gtest tests. I don't deem those critical by any means. Did a performance run, too. Everything seems to be ok, but I'll double check with Mostafa. -- To view, visit http://gerrit.cloudera.org:8080/2200 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I044f83806fcde820fcb38047cf6b8e780d803858 Gerrit-PatchSet: 31 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Lars VolkerGerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: anujphadke Gerrit-HasComments: No
[Impala-CR](cdh5-trunk) IMPALA-2979: Fix scheduling on remote hosts
Hello Marcel Kornacker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/2200 to look at the new patch set (#31). Change subject: IMPALA-2979: Fix scheduling on remote hosts .. IMPALA-2979: Fix scheduling on remote hosts Also fixes: IMPALA-2400, IMPALA-3043 This change fixes scheduling scan-ranges on remote hosts by adding remote backend selection capability to SimpleScheduler. Prior to this change the scheduler would try to select a local backend even when remote scheduling was requested. This change also allows pseudo-randomized remote backend selection to prevent convoying, which could happen when different independent schedulers had the same internal state, e.g. after a cluster restart. To enable the new behavior set the query option SCHEDULE_RANDOM_REPLICA to true. This change also fixes IMPALA-2400: Unpredictable locality behavior for reading Parquet files This change also fixes IMPALA-3043: SimpleScheduler does not handle hosts with multiple IP addresses correctly This change also does some clean-up in scheduler.h and simple-scheduler.{h,cc}. Change-Id: I044f83806fcde820fcb38047cf6b8e780d803858 --- M be/src/scheduling/scheduler.h M be/src/scheduling/simple-scheduler-test.cc M be/src/scheduling/simple-scheduler.cc M be/src/scheduling/simple-scheduler.h M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M fe/src/main/java/com/cloudera/impala/analysis/TableRef.java M fe/src/test/java/com/cloudera/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/com/cloudera/impala/analysis/ParserTest.java M fe/src/test/java/com/cloudera/impala/analysis/ToSqlTest.java 12 files changed, 984 insertions(+), 489 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/00/2200/31 -- To view, visit http://gerrit.cloudera.org:8080/2200 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I044f83806fcde820fcb38047cf6b8e780d803858 Gerrit-PatchSet: 31 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Lars VolkerGerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: anujphadke
Re: IMPALA-2428 Support multiple-character string as the field delimiter
We must be very careful about breaking changes. We may want to put this change in Impala 3.0, rather than 2.x, if it breaks existing DDL statements. > Field terminator can't be an empty string How is this different that the current restrictions on field terminators? If field terminators can currently be empty strings, what kind of queries or DDL statements does this break? Do we currently have tests for those? Do we expect that many users are using them? These questions are also of interest to me on your three other restrictions.