Re: Issue with Hive and table with lots of column
ok. thanks. so given everything we know the choices i see are: 1. increase your heapsize some more. (And of course confirm your process that your reported the -Xmx8192M is the HiveServer2 process.) 2. modify your query such that it doesn't use select * 3. modify your query such that it does its own buffering. maybe stream it? 4. create a jira ticket and request that the internal buffer size that the hiveserver2 uses for staging results be configurable. That's all _i_ got left in the tank for this issue. I think we need an SME who is familiar with the code now. Regards, Stephen. On Tue, Feb 18, 2014 at 10:57 AM, David Gayou david.ga...@kxen.com wrote: Sorry i badly reported it. It's 8192M Thanks, David. Le 18 févr. 2014 18:37, Stephen Sprague sprag...@gmail.com a écrit : oh. i just noticed the -Xmx value you reported. there's no M or G after that number?? I'd like to see -Xmx8192M or -Xmx8G. That *is* very important. thanks, Stephen. On Tue, Feb 18, 2014 at 9:22 AM, Stephen Sprague sprag...@gmail.comwrote: thanks. re #1. we need to find that Hiveserver2 process. For all i know the one you reported is hiveserver1 (which works.) chances are they use the same -Xmx value but we really shouldn't make any assumptions. try wide format on the ps command (eg. ps -efw | grep -i Hiveserver2) re.#2. okay. so that tells us is not the number of columns blowing the heap but rather the combination of rows + columns. There's no way it stores the full result set on the heap even under normal circumstances so my guess is there's an internal number of rows it buffers. sorta like how unix buffers stdout. How and where that's set is out of my league. However, maybe you get around it by upping your heapsize again if you have the available memory of course. On Tue, Feb 18, 2014 at 8:39 AM, David Gayou david.ga...@kxen.comwrote: 1. I have no process with hiveserver2 ... ps -ef | grep -i hive return some pretty long command with a -Xmx8192 and that's the value set in hive-env.sh 2. The select * from table limit 1 or even 100 is working correctly. David. On Tue, Feb 18, 2014 at 4:16 PM, Stephen Sprague sprag...@gmail.comwrote: He lives on after all! and thanks for the continued feedback. We need the answers to these questions using HS2: 1. what is the output of ps -ef | grep -i hiveserver2 on your system? in particular what is the value of -Xmx ? 2. does select * from table limit 1 work? Thanks, Stephen. On Tue, Feb 18, 2014 at 6:32 AM, David Gayou david.ga...@kxen.comwrote: I'm so sorry, i wrote an answer, and i forgot to sent it And i haven't been able to work on this for a few days. So far : I have a 15k columns table and 50k rows. I do not see any changes if i change the storage. *Hive 12.0* My test query is select * from bigtable If i use the hive cli, it works fine. If i use hiveserver1 + ODBC : it works fine If i use hiverserver2 + odbc or hiverserver2 + beeline,i have this java exception : 2014-02-18 13:22:22,571 ERROR thrift.ProcessFunction (ProcessFunction.java:process(41)) - Internal error processing FetchResults java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2734) at java.util.ArrayList.ensureCapacity(ArrayList.java:167) at java.util.ArrayList.add(ArrayList.java:351) at org.apache.hive.service.cli.thrift.TRow.addToColVals(TRow.java:160) at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:60) at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:32) at org.apache.hive.service.cli.operation.SQLOperation.prepareFromRow(SQLOperation.java:270) at org.apache.hive.service.cli.operation.SQLOperation.decode(SQLOperation.java:262) at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:246) at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:171) at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:438) at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:346) at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:407) at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1373) *From the SVN trunk* : (for the HIVE-3746) With the maven change, most of the documentation and wiki are out of date. Compiling from trunk was not that easy and i may have failed some steps but : It has the same behavior. It works in CLI and hiveserver1. It fails with hiveserver 2. Regards David Gayou On Thu, Feb 13, 2014 at 3:11 AM, Navis류승우 navis@nexr.com wrote: With HIVE-3746, which will be included in hive-0.13, HiveServer2 takes less memory than before. Could you try it with the
Re: Issue with Hive and table with lots of column
I'm so sorry, i wrote an answer, and i forgot to sent it And i haven't been able to work on this for a few days. So far : I have a 15k columns table and 50k rows. I do not see any changes if i change the storage. *Hive 12.0* My test query is select * from bigtable If i use the hive cli, it works fine. If i use hiveserver1 + ODBC : it works fine If i use hiverserver2 + odbc or hiverserver2 + beeline,i have this java exception : 2014-02-18 13:22:22,571 ERROR thrift.ProcessFunction (ProcessFunction.java:process(41)) - Internal error processing FetchResults java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2734) at java.util.ArrayList.ensureCapacity(ArrayList.java:167) at java.util.ArrayList.add(ArrayList.java:351) at org.apache.hive.service.cli.thrift.TRow.addToColVals(TRow.java:160) at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:60) at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:32) at org.apache.hive.service.cli.operation.SQLOperation.prepareFromRow(SQLOperation.java:270) at org.apache.hive.service.cli.operation.SQLOperation.decode(SQLOperation.java:262) at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:246) at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:171) at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:438) at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:346) at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:407) at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1373) *From the SVN trunk* : (for the HIVE-3746) With the maven change, most of the documentation and wiki are out of date. Compiling from trunk was not that easy and i may have failed some steps but : It has the same behavior. It works in CLI and hiveserver1. It fails with hiveserver 2. Regards David Gayou On Thu, Feb 13, 2014 at 3:11 AM, Navis류승우 navis@nexr.com wrote: With HIVE-3746, which will be included in hive-0.13, HiveServer2 takes less memory than before. Could you try it with the version in trunk? 2014-02-13 10:49 GMT+09:00 Stephen Sprague sprag...@gmail.com: question to the original poster. closure appreciated! On Fri, Jan 31, 2014 at 12:22 PM, Stephen Sprague sprag...@gmail.comwrote: thanks Ed. And on a separate tact lets look at Hiveserver2. @OP *I've tried to look around on how i can change the thrift heap size but haven't found anything.* looking at my hiveserver2 i find this: $ ps -ef | grep -i hiveserver2 dwr 9824 20479 0 12:11 pts/100:00:00 grep -i hiveserver2 dwr 28410 1 0 00:05 ?00:01:04 /usr/lib/jvm/java-6-sun/jre/bin/java *-Xmx256m*-Dhadoop.log.dir=/usr/lib/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console -Djava.library.path=/usr/lib/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /usr/lib/hive/lib/hive-service-0.12.0.jar org.apache.hive.service.server.HiveServer2 questions: 1. what is the output of ps -ef | grep -i hiveserver2 on your system? in particular what is the value of -Xmx ? 2. can you restart your hiveserver with -Xmx1g? or some value that makes sense to your system? Lots of questions now. we await your answers! :) On Fri, Jan 31, 2014 at 11:51 AM, Edward Capriolo edlinuxg...@gmail.com wrote: Final table compression should not effect the de serialized size of the data over the wire. On Fri, Jan 31, 2014 at 2:49 PM, Stephen Sprague sprag...@gmail.comwrote: Excellent progress David. So. What the most important thing here we learned was that it works (!) by running hive in local mode and that this error is a limitation in the HiveServer2. That's important. so textfile storage handler and having issues converting it to ORC. hmmm. follow-ups. 1. what is your query that fails? 2. can you add a limit 1 to the end of your query and tell us if that works? this'll tell us if it's column or row bound. 3. bonus points. run these in local mode: set hive.exec.compress.output=true; set mapred.output.compression.type=BLOCK; set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec; create table blah stored as ORC as select * from your table; #i'm curious if this'll work. show create table blah; #send output back if previous step worked. 4. extra bonus. change ORC to SEQUENCEFILE in #3 see if that works any differently. I'm wondering if compression would have any effect on the size of
Re: Issue with Hive and table with lots of column
He lives on after all! and thanks for the continued feedback. We need the answers to these questions using HS2: 1. what is the output of ps -ef | grep -i hiveserver2 on your system? in particular what is the value of -Xmx ? 2. does select * from table limit 1 work? Thanks, Stephen. On Tue, Feb 18, 2014 at 6:32 AM, David Gayou david.ga...@kxen.com wrote: I'm so sorry, i wrote an answer, and i forgot to sent it And i haven't been able to work on this for a few days. So far : I have a 15k columns table and 50k rows. I do not see any changes if i change the storage. *Hive 12.0* My test query is select * from bigtable If i use the hive cli, it works fine. If i use hiveserver1 + ODBC : it works fine If i use hiverserver2 + odbc or hiverserver2 + beeline,i have this java exception : 2014-02-18 13:22:22,571 ERROR thrift.ProcessFunction (ProcessFunction.java:process(41)) - Internal error processing FetchResults java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2734) at java.util.ArrayList.ensureCapacity(ArrayList.java:167) at java.util.ArrayList.add(ArrayList.java:351) at org.apache.hive.service.cli.thrift.TRow.addToColVals(TRow.java:160) at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:60) at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:32) at org.apache.hive.service.cli.operation.SQLOperation.prepareFromRow(SQLOperation.java:270) at org.apache.hive.service.cli.operation.SQLOperation.decode(SQLOperation.java:262) at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:246) at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:171) at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:438) at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:346) at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:407) at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1373) *From the SVN trunk* : (for the HIVE-3746) With the maven change, most of the documentation and wiki are out of date. Compiling from trunk was not that easy and i may have failed some steps but : It has the same behavior. It works in CLI and hiveserver1. It fails with hiveserver 2. Regards David Gayou On Thu, Feb 13, 2014 at 3:11 AM, Navis류승우 navis@nexr.com wrote: With HIVE-3746, which will be included in hive-0.13, HiveServer2 takes less memory than before. Could you try it with the version in trunk? 2014-02-13 10:49 GMT+09:00 Stephen Sprague sprag...@gmail.com: question to the original poster. closure appreciated! On Fri, Jan 31, 2014 at 12:22 PM, Stephen Sprague sprag...@gmail.comwrote: thanks Ed. And on a separate tact lets look at Hiveserver2. @OP *I've tried to look around on how i can change the thrift heap size but haven't found anything.* looking at my hiveserver2 i find this: $ ps -ef | grep -i hiveserver2 dwr 9824 20479 0 12:11 pts/100:00:00 grep -i hiveserver2 dwr 28410 1 0 00:05 ?00:01:04 /usr/lib/jvm/java-6-sun/jre/bin/java *-Xmx256m*-Dhadoop.log.dir=/usr/lib/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console -Djava.library.path=/usr/lib/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /usr/lib/hive/lib/hive-service-0.12.0.jar org.apache.hive.service.server.HiveServer2 questions: 1. what is the output of ps -ef | grep -i hiveserver2 on your system? in particular what is the value of -Xmx ? 2. can you restart your hiveserver with -Xmx1g? or some value that makes sense to your system? Lots of questions now. we await your answers! :) On Fri, Jan 31, 2014 at 11:51 AM, Edward Capriolo edlinuxg...@gmail.com wrote: Final table compression should not effect the de serialized size of the data over the wire. On Fri, Jan 31, 2014 at 2:49 PM, Stephen Sprague sprag...@gmail.comwrote: Excellent progress David. So. What the most important thing here we learned was that it works (!) by running hive in local mode and that this error is a limitation in the HiveServer2. That's important. so textfile storage handler and having issues converting it to ORC. hmmm. follow-ups. 1. what is your query that fails? 2. can you add a limit 1 to the end of your query and tell us if that works? this'll tell us if it's column or row bound. 3. bonus points. run these in local mode: set hive.exec.compress.output=true; set mapred.output.compression.type=BLOCK;
Re: Issue with Hive and table with lots of column
1. I have no process with hiveserver2 ... ps -ef | grep -i hive return some pretty long command with a -Xmx8192 and that's the value set in hive-env.sh 2. The select * from table limit 1 or even 100 is working correctly. David. On Tue, Feb 18, 2014 at 4:16 PM, Stephen Sprague sprag...@gmail.com wrote: He lives on after all! and thanks for the continued feedback. We need the answers to these questions using HS2: 1. what is the output of ps -ef | grep -i hiveserver2 on your system? in particular what is the value of -Xmx ? 2. does select * from table limit 1 work? Thanks, Stephen. On Tue, Feb 18, 2014 at 6:32 AM, David Gayou david.ga...@kxen.com wrote: I'm so sorry, i wrote an answer, and i forgot to sent it And i haven't been able to work on this for a few days. So far : I have a 15k columns table and 50k rows. I do not see any changes if i change the storage. *Hive 12.0* My test query is select * from bigtable If i use the hive cli, it works fine. If i use hiveserver1 + ODBC : it works fine If i use hiverserver2 + odbc or hiverserver2 + beeline,i have this java exception : 2014-02-18 13:22:22,571 ERROR thrift.ProcessFunction (ProcessFunction.java:process(41)) - Internal error processing FetchResults java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2734) at java.util.ArrayList.ensureCapacity(ArrayList.java:167) at java.util.ArrayList.add(ArrayList.java:351) at org.apache.hive.service.cli.thrift.TRow.addToColVals(TRow.java:160) at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:60) at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:32) at org.apache.hive.service.cli.operation.SQLOperation.prepareFromRow(SQLOperation.java:270) at org.apache.hive.service.cli.operation.SQLOperation.decode(SQLOperation.java:262) at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:246) at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:171) at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:438) at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:346) at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:407) at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1373) *From the SVN trunk* : (for the HIVE-3746) With the maven change, most of the documentation and wiki are out of date. Compiling from trunk was not that easy and i may have failed some steps but : It has the same behavior. It works in CLI and hiveserver1. It fails with hiveserver 2. Regards David Gayou On Thu, Feb 13, 2014 at 3:11 AM, Navis류승우 navis@nexr.com wrote: With HIVE-3746, which will be included in hive-0.13, HiveServer2 takes less memory than before. Could you try it with the version in trunk? 2014-02-13 10:49 GMT+09:00 Stephen Sprague sprag...@gmail.com: question to the original poster. closure appreciated! On Fri, Jan 31, 2014 at 12:22 PM, Stephen Sprague sprag...@gmail.comwrote: thanks Ed. And on a separate tact lets look at Hiveserver2. @OP *I've tried to look around on how i can change the thrift heap size but haven't found anything.* looking at my hiveserver2 i find this: $ ps -ef | grep -i hiveserver2 dwr 9824 20479 0 12:11 pts/100:00:00 grep -i hiveserver2 dwr 28410 1 0 00:05 ?00:01:04 /usr/lib/jvm/java-6-sun/jre/bin/java *-Xmx256m*-Dhadoop.log.dir=/usr/lib/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console -Djava.library.path=/usr/lib/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /usr/lib/hive/lib/hive-service-0.12.0.jar org.apache.hive.service.server.HiveServer2 questions: 1. what is the output of ps -ef | grep -i hiveserver2 on your system? in particular what is the value of -Xmx ? 2. can you restart your hiveserver with -Xmx1g? or some value that makes sense to your system? Lots of questions now. we await your answers! :) On Fri, Jan 31, 2014 at 11:51 AM, Edward Capriolo edlinuxg...@gmail.com wrote: Final table compression should not effect the de serialized size of the data over the wire. On Fri, Jan 31, 2014 at 2:49 PM, Stephen Sprague sprag...@gmail.comwrote: Excellent progress David. So. What the most important thing here we learned was that it works (!) by running hive in local mode and that this error is a limitation in the HiveServer2. That's important. so textfile storage handler and having issues converting it to
Re: Issue with Hive and table with lots of column
thanks. re #1. we need to find that Hiveserver2 process. For all i know the one you reported is hiveserver1 (which works.) chances are they use the same -Xmx value but we really shouldn't make any assumptions. try wide format on the ps command (eg. ps -efw | grep -i Hiveserver2) re.#2. okay. so that tells us is not the number of columns blowing the heap but rather the combination of rows + columns. There's no way it stores the full result set on the heap even under normal circumstances so my guess is there's an internal number of rows it buffers. sorta like how unix buffers stdout. How and where that's set is out of my league. However, maybe you get around it by upping your heapsize again if you have the available memory of course. On Tue, Feb 18, 2014 at 8:39 AM, David Gayou david.ga...@kxen.com wrote: 1. I have no process with hiveserver2 ... ps -ef | grep -i hive return some pretty long command with a -Xmx8192 and that's the value set in hive-env.sh 2. The select * from table limit 1 or even 100 is working correctly. David. On Tue, Feb 18, 2014 at 4:16 PM, Stephen Sprague sprag...@gmail.comwrote: He lives on after all! and thanks for the continued feedback. We need the answers to these questions using HS2: 1. what is the output of ps -ef | grep -i hiveserver2 on your system? in particular what is the value of -Xmx ? 2. does select * from table limit 1 work? Thanks, Stephen. On Tue, Feb 18, 2014 at 6:32 AM, David Gayou david.ga...@kxen.comwrote: I'm so sorry, i wrote an answer, and i forgot to sent it And i haven't been able to work on this for a few days. So far : I have a 15k columns table and 50k rows. I do not see any changes if i change the storage. *Hive 12.0* My test query is select * from bigtable If i use the hive cli, it works fine. If i use hiveserver1 + ODBC : it works fine If i use hiverserver2 + odbc or hiverserver2 + beeline,i have this java exception : 2014-02-18 13:22:22,571 ERROR thrift.ProcessFunction (ProcessFunction.java:process(41)) - Internal error processing FetchResults java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2734) at java.util.ArrayList.ensureCapacity(ArrayList.java:167) at java.util.ArrayList.add(ArrayList.java:351) at org.apache.hive.service.cli.thrift.TRow.addToColVals(TRow.java:160) at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:60) at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:32) at org.apache.hive.service.cli.operation.SQLOperation.prepareFromRow(SQLOperation.java:270) at org.apache.hive.service.cli.operation.SQLOperation.decode(SQLOperation.java:262) at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:246) at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:171) at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:438) at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:346) at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:407) at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1373) *From the SVN trunk* : (for the HIVE-3746) With the maven change, most of the documentation and wiki are out of date. Compiling from trunk was not that easy and i may have failed some steps but : It has the same behavior. It works in CLI and hiveserver1. It fails with hiveserver 2. Regards David Gayou On Thu, Feb 13, 2014 at 3:11 AM, Navis류승우 navis@nexr.com wrote: With HIVE-3746, which will be included in hive-0.13, HiveServer2 takes less memory than before. Could you try it with the version in trunk? 2014-02-13 10:49 GMT+09:00 Stephen Sprague sprag...@gmail.com: question to the original poster. closure appreciated! On Fri, Jan 31, 2014 at 12:22 PM, Stephen Sprague sprag...@gmail.comwrote: thanks Ed. And on a separate tact lets look at Hiveserver2. @OP *I've tried to look around on how i can change the thrift heap size but haven't found anything.* looking at my hiveserver2 i find this: $ ps -ef | grep -i hiveserver2 dwr 9824 20479 0 12:11 pts/100:00:00 grep -i hiveserver2 dwr 28410 1 0 00:05 ?00:01:04 /usr/lib/jvm/java-6-sun/jre/bin/java *-Xmx256m*-Dhadoop.log.dir=/usr/lib/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console -Djava.library.path=/usr/lib/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /usr/lib/hive/lib/hive-service-0.12.0.jar
Re: Issue with Hive and table with lots of column
oh. i just noticed the -Xmx value you reported. there's no M or G after that number?? I'd like to see -Xmx8192M or -Xmx8G. That *is* very important. thanks, Stephen. On Tue, Feb 18, 2014 at 9:22 AM, Stephen Sprague sprag...@gmail.com wrote: thanks. re #1. we need to find that Hiveserver2 process. For all i know the one you reported is hiveserver1 (which works.) chances are they use the same -Xmx value but we really shouldn't make any assumptions. try wide format on the ps command (eg. ps -efw | grep -i Hiveserver2) re.#2. okay. so that tells us is not the number of columns blowing the heap but rather the combination of rows + columns. There's no way it stores the full result set on the heap even under normal circumstances so my guess is there's an internal number of rows it buffers. sorta like how unix buffers stdout. How and where that's set is out of my league. However, maybe you get around it by upping your heapsize again if you have the available memory of course. On Tue, Feb 18, 2014 at 8:39 AM, David Gayou david.ga...@kxen.com wrote: 1. I have no process with hiveserver2 ... ps -ef | grep -i hive return some pretty long command with a -Xmx8192 and that's the value set in hive-env.sh 2. The select * from table limit 1 or even 100 is working correctly. David. On Tue, Feb 18, 2014 at 4:16 PM, Stephen Sprague sprag...@gmail.comwrote: He lives on after all! and thanks for the continued feedback. We need the answers to these questions using HS2: 1. what is the output of ps -ef | grep -i hiveserver2 on your system? in particular what is the value of -Xmx ? 2. does select * from table limit 1 work? Thanks, Stephen. On Tue, Feb 18, 2014 at 6:32 AM, David Gayou david.ga...@kxen.comwrote: I'm so sorry, i wrote an answer, and i forgot to sent it And i haven't been able to work on this for a few days. So far : I have a 15k columns table and 50k rows. I do not see any changes if i change the storage. *Hive 12.0* My test query is select * from bigtable If i use the hive cli, it works fine. If i use hiveserver1 + ODBC : it works fine If i use hiverserver2 + odbc or hiverserver2 + beeline,i have this java exception : 2014-02-18 13:22:22,571 ERROR thrift.ProcessFunction (ProcessFunction.java:process(41)) - Internal error processing FetchResults java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2734) at java.util.ArrayList.ensureCapacity(ArrayList.java:167) at java.util.ArrayList.add(ArrayList.java:351) at org.apache.hive.service.cli.thrift.TRow.addToColVals(TRow.java:160) at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:60) at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:32) at org.apache.hive.service.cli.operation.SQLOperation.prepareFromRow(SQLOperation.java:270) at org.apache.hive.service.cli.operation.SQLOperation.decode(SQLOperation.java:262) at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:246) at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:171) at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:438) at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:346) at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:407) at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1373) *From the SVN trunk* : (for the HIVE-3746) With the maven change, most of the documentation and wiki are out of date. Compiling from trunk was not that easy and i may have failed some steps but : It has the same behavior. It works in CLI and hiveserver1. It fails with hiveserver 2. Regards David Gayou On Thu, Feb 13, 2014 at 3:11 AM, Navis류승우 navis@nexr.com wrote: With HIVE-3746, which will be included in hive-0.13, HiveServer2 takes less memory than before. Could you try it with the version in trunk? 2014-02-13 10:49 GMT+09:00 Stephen Sprague sprag...@gmail.com: question to the original poster. closure appreciated! On Fri, Jan 31, 2014 at 12:22 PM, Stephen Sprague sprag...@gmail.com wrote: thanks Ed. And on a separate tact lets look at Hiveserver2. @OP *I've tried to look around on how i can change the thrift heap size but haven't found anything.* looking at my hiveserver2 i find this: $ ps -ef | grep -i hiveserver2 dwr 9824 20479 0 12:11 pts/100:00:00 grep -i hiveserver2 dwr 28410 1 0 00:05 ?00:01:04 /usr/lib/jvm/java-6-sun/jre/bin/java *-Xmx256m*-Dhadoop.log.dir=/usr/lib/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console
Re: Issue with Hive and table with lots of column
Sorry i badly reported it. It's 8192M Thanks, David. Le 18 févr. 2014 18:37, Stephen Sprague sprag...@gmail.com a écrit : oh. i just noticed the -Xmx value you reported. there's no M or G after that number?? I'd like to see -Xmx8192M or -Xmx8G. That *is* very important. thanks, Stephen. On Tue, Feb 18, 2014 at 9:22 AM, Stephen Sprague sprag...@gmail.comwrote: thanks. re #1. we need to find that Hiveserver2 process. For all i know the one you reported is hiveserver1 (which works.) chances are they use the same -Xmx value but we really shouldn't make any assumptions. try wide format on the ps command (eg. ps -efw | grep -i Hiveserver2) re.#2. okay. so that tells us is not the number of columns blowing the heap but rather the combination of rows + columns. There's no way it stores the full result set on the heap even under normal circumstances so my guess is there's an internal number of rows it buffers. sorta like how unix buffers stdout. How and where that's set is out of my league. However, maybe you get around it by upping your heapsize again if you have the available memory of course. On Tue, Feb 18, 2014 at 8:39 AM, David Gayou david.ga...@kxen.comwrote: 1. I have no process with hiveserver2 ... ps -ef | grep -i hive return some pretty long command with a -Xmx8192 and that's the value set in hive-env.sh 2. The select * from table limit 1 or even 100 is working correctly. David. On Tue, Feb 18, 2014 at 4:16 PM, Stephen Sprague sprag...@gmail.comwrote: He lives on after all! and thanks for the continued feedback. We need the answers to these questions using HS2: 1. what is the output of ps -ef | grep -i hiveserver2 on your system? in particular what is the value of -Xmx ? 2. does select * from table limit 1 work? Thanks, Stephen. On Tue, Feb 18, 2014 at 6:32 AM, David Gayou david.ga...@kxen.comwrote: I'm so sorry, i wrote an answer, and i forgot to sent it And i haven't been able to work on this for a few days. So far : I have a 15k columns table and 50k rows. I do not see any changes if i change the storage. *Hive 12.0* My test query is select * from bigtable If i use the hive cli, it works fine. If i use hiveserver1 + ODBC : it works fine If i use hiverserver2 + odbc or hiverserver2 + beeline,i have this java exception : 2014-02-18 13:22:22,571 ERROR thrift.ProcessFunction (ProcessFunction.java:process(41)) - Internal error processing FetchResults java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2734) at java.util.ArrayList.ensureCapacity(ArrayList.java:167) at java.util.ArrayList.add(ArrayList.java:351) at org.apache.hive.service.cli.thrift.TRow.addToColVals(TRow.java:160) at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:60) at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:32) at org.apache.hive.service.cli.operation.SQLOperation.prepareFromRow(SQLOperation.java:270) at org.apache.hive.service.cli.operation.SQLOperation.decode(SQLOperation.java:262) at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:246) at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:171) at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:438) at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:346) at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:407) at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1373) *From the SVN trunk* : (for the HIVE-3746) With the maven change, most of the documentation and wiki are out of date. Compiling from trunk was not that easy and i may have failed some steps but : It has the same behavior. It works in CLI and hiveserver1. It fails with hiveserver 2. Regards David Gayou On Thu, Feb 13, 2014 at 3:11 AM, Navis류승우 navis@nexr.com wrote: With HIVE-3746, which will be included in hive-0.13, HiveServer2 takes less memory than before. Could you try it with the version in trunk? 2014-02-13 10:49 GMT+09:00 Stephen Sprague sprag...@gmail.com: question to the original poster. closure appreciated! On Fri, Jan 31, 2014 at 12:22 PM, Stephen Sprague sprag...@gmail.com wrote: thanks Ed. And on a separate tact lets look at Hiveserver2. @OP *I've tried to look around on how i can change the thrift heap size but haven't found anything.* looking at my hiveserver2 i find this: $ ps -ef | grep -i hiveserver2 dwr 9824 20479 0 12:11 pts/100:00:00 grep -i hiveserver2 dwr 28410 1 0 00:05 ?00:01:04 /usr/lib/jvm/java-6-sun/jre/bin/java
Re: Issue with Hive and table with lots of column
With HIVE-3746, which will be included in hive-0.13, HiveServer2 takes less memory than before. Could you try it with the version in trunk? 2014-02-13 10:49 GMT+09:00 Stephen Sprague sprag...@gmail.com: question to the original poster. closure appreciated! On Fri, Jan 31, 2014 at 12:22 PM, Stephen Sprague sprag...@gmail.comwrote: thanks Ed. And on a separate tact lets look at Hiveserver2. @OP *I've tried to look around on how i can change the thrift heap size but haven't found anything.* looking at my hiveserver2 i find this: $ ps -ef | grep -i hiveserver2 dwr 9824 20479 0 12:11 pts/100:00:00 grep -i hiveserver2 dwr 28410 1 0 00:05 ?00:01:04 /usr/lib/jvm/java-6-sun/jre/bin/java *-Xmx256m*-Dhadoop.log.dir=/usr/lib/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console -Djava.library.path=/usr/lib/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /usr/lib/hive/lib/hive-service-0.12.0.jar org.apache.hive.service.server.HiveServer2 questions: 1. what is the output of ps -ef | grep -i hiveserver2 on your system? in particular what is the value of -Xmx ? 2. can you restart your hiveserver with -Xmx1g? or some value that makes sense to your system? Lots of questions now. we await your answers! :) On Fri, Jan 31, 2014 at 11:51 AM, Edward Capriolo edlinuxg...@gmail.comwrote: Final table compression should not effect the de serialized size of the data over the wire. On Fri, Jan 31, 2014 at 2:49 PM, Stephen Sprague sprag...@gmail.comwrote: Excellent progress David. So. What the most important thing here we learned was that it works (!) by running hive in local mode and that this error is a limitation in the HiveServer2. That's important. so textfile storage handler and having issues converting it to ORC. hmmm. follow-ups. 1. what is your query that fails? 2. can you add a limit 1 to the end of your query and tell us if that works? this'll tell us if it's column or row bound. 3. bonus points. run these in local mode: set hive.exec.compress.output=true; set mapred.output.compression.type=BLOCK; set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec; create table blah stored as ORC as select * from your table; #i'm curious if this'll work. show create table blah; #send output back if previous step worked. 4. extra bonus. change ORC to SEQUENCEFILE in #3 see if that works any differently. I'm wondering if compression would have any effect on the size of the internal ArrayList the thrift server uses. On Fri, Jan 31, 2014 at 9:21 AM, David Gayou david.ga...@kxen.comwrote: Ok, so here are some news : I tried to boost the HADOOP_HEAPSIZE to 8192, I also setted the mapred.child.java.opts to 512M And it doesn't seem's to have any effect. -- I tried it using an ODBC driver = fail after few minutes. Using a local JDBC (beeline) = running forever without any error. Both through hiveserver 2 If i use the local mode : it works! (but that not really what i need, as i don't really how to access it with my software) -- I use a text file as storage. I tried to use ORC, but i can't populate it with a load data (it return an error of file format). Using an ALTER TABLE orange_large_train_3 SET FILEFORMAT ORC after populating the table, i have a file format error on select. -- @Edward : I've tried to look around on how i can change the thrift heap size but haven't found anything. Same thing for my client (haven't found how to change the heap size) My usecase is really to have the most possible columns. Thanks a lot for your help Regards David On Fri, Jan 31, 2014 at 1:12 AM, Edward Capriolo edlinuxg...@gmail.com wrote: Ok here are the problem(s). Thrift has frame size limits, thrift has to buffer rows into memory. Hove thrift has a heap size, it needs to big in this case. Your client needs a big heap size as well. The way to do this query if it is possible may be turning row lateral, potwntially by treating it as a list, it will make queries on it awkward. Good luck On Thursday, January 30, 2014, Stephen Sprague sprag...@gmail.com wrote: oh. thinking some more about this i forgot to ask some other basic questions. a) what storage format are you using for the table (text, sequence, rcfile, orc or custom)? show create table table would yield that. b) what command is causing the stack trace? my thinking here is rcfile and orc are column based (i think) and if you don't select all the columns that could very well limit the size of the row being returned and hence the size of the internal ArrayList. OTOH, if you're using select *, um, you have my sympathies.
Re: Issue with Hive and table with lots of column
Ok, so here are some news : I tried to boost the HADOOP_HEAPSIZE to 8192, I also setted the mapred.child.java.opts to 512M And it doesn't seem's to have any effect. -- I tried it using an ODBC driver = fail after few minutes. Using a local JDBC (beeline) = running forever without any error. Both through hiveserver 2 If i use the local mode : it works! (but that not really what i need, as i don't really how to access it with my software) -- I use a text file as storage. I tried to use ORC, but i can't populate it with a load data (it return an error of file format). Using an ALTER TABLE orange_large_train_3 SET FILEFORMAT ORC after populating the table, i have a file format error on select. -- @Edward : I've tried to look around on how i can change the thrift heap size but haven't found anything. Same thing for my client (haven't found how to change the heap size) My usecase is really to have the most possible columns. Thanks a lot for your help Regards David On Fri, Jan 31, 2014 at 1:12 AM, Edward Capriolo edlinuxg...@gmail.comwrote: Ok here are the problem(s). Thrift has frame size limits, thrift has to buffer rows into memory. Hove thrift has a heap size, it needs to big in this case. Your client needs a big heap size as well. The way to do this query if it is possible may be turning row lateral, potwntially by treating it as a list, it will make queries on it awkward. Good luck On Thursday, January 30, 2014, Stephen Sprague sprag...@gmail.com wrote: oh. thinking some more about this i forgot to ask some other basic questions. a) what storage format are you using for the table (text, sequence, rcfile, orc or custom)? show create table table would yield that. b) what command is causing the stack trace? my thinking here is rcfile and orc are column based (i think) and if you don't select all the columns that could very well limit the size of the row being returned and hence the size of the internal ArrayList. OTOH, if you're using select *, um, you have my sympathies. :) On Thu, Jan 30, 2014 at 11:33 AM, Stephen Sprague sprag...@gmail.com wrote: thanks for the information. Up-to-date hive. Cluster on the smallish side. And, well, sure looks like a memory issue. :) rather than an inherent hive limitation that is. So. I can only speak as a user (ie. not a hive developer) but what i'd be interested in knowing next is is this via running hive in local mode, correct? (eg. not through hiveserver1/2). And it looks like it boinks on array processing which i assume to be internal code arrays and not hive data arrays - your 15K columns are all scalar/simple types, correct? Its clearly fetching results and looks be trying to store them in a java array - and not just one row but a *set* of rows (ArrayList) two things to try. 1. boost the heap-size. try 8192. And I don't know if HADOOP_HEAPSIZE is the controller of that. I woulda hoped it was called something like HIVE_HEAPSIZE. :) Anyway, can't hurt to try. 2. trim down the number of columns and see where the breaking point is. is it 10K? is it 5K? The idea is to confirm its _the number of columns_ that is causing the memory to blow and not some other artifact unbeknownst to us. 3. Google around the Hive namespace for something that might limit or otherwise control the number of rows stored at once in Hive's internal buffer. I snoop around too. That's all i got for now and maybe we'll get lucky and someone on this list will know something or another about this. :) cheers, Stephen. On Thu, Jan 30, 2014 at 2:32 AM, David Gayou david.ga...@kxen.com wrote: We are using the Hive 0.12.0, but it doesn't work better on hive 0.11.0 or hive 0.10.0 Our hadoop version is 1.1.2. Our cluster is 1 master + 4 slaves with 1 dual core xeon CPU (with hyperthreading so 4 cores per machine) + 16Gb Ram each The error message i get is : 2014-01-29 12:41:09,086 ERROR thrift.ProcessFunction (ProcessFunction.java:process(41)) - Internal error processing FetchResults java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2734) at java.util.ArrayList.ensureCapacity(ArrayList.java:167) at java.util.ArrayList.add(ArrayList.java:351) at org.apache.hive.service.cli.Row.init(Row.java:47) at org.apache.hive.service.cli.RowSet.addRow(RowSet.java:61) at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:235) at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:170) at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:417) at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:306) at
Re: Issue with Hive and table with lots of column
Excellent progress David. So. What the most important thing here we learned was that it works (!) by running hive in local mode and that this error is a limitation in the HiveServer2. That's important. so textfile storage handler and having issues converting it to ORC. hmmm. follow-ups. 1. what is your query that fails? 2. can you add a limit 1 to the end of your query and tell us if that works? this'll tell us if it's column or row bound. 3. bonus points. run these in local mode: set hive.exec.compress.output=true; set mapred.output.compression.type=BLOCK; set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec; create table blah stored as ORC as select * from your table; #i'm curious if this'll work. show create table blah; #send output back if previous step worked. 4. extra bonus. change ORC to SEQUENCEFILE in #3 see if that works any differently. I'm wondering if compression would have any effect on the size of the internal ArrayList the thrift server uses. On Fri, Jan 31, 2014 at 9:21 AM, David Gayou david.ga...@kxen.com wrote: Ok, so here are some news : I tried to boost the HADOOP_HEAPSIZE to 8192, I also setted the mapred.child.java.opts to 512M And it doesn't seem's to have any effect. -- I tried it using an ODBC driver = fail after few minutes. Using a local JDBC (beeline) = running forever without any error. Both through hiveserver 2 If i use the local mode : it works! (but that not really what i need, as i don't really how to access it with my software) -- I use a text file as storage. I tried to use ORC, but i can't populate it with a load data (it return an error of file format). Using an ALTER TABLE orange_large_train_3 SET FILEFORMAT ORC after populating the table, i have a file format error on select. -- @Edward : I've tried to look around on how i can change the thrift heap size but haven't found anything. Same thing for my client (haven't found how to change the heap size) My usecase is really to have the most possible columns. Thanks a lot for your help Regards David On Fri, Jan 31, 2014 at 1:12 AM, Edward Capriolo edlinuxg...@gmail.comwrote: Ok here are the problem(s). Thrift has frame size limits, thrift has to buffer rows into memory. Hove thrift has a heap size, it needs to big in this case. Your client needs a big heap size as well. The way to do this query if it is possible may be turning row lateral, potwntially by treating it as a list, it will make queries on it awkward. Good luck On Thursday, January 30, 2014, Stephen Sprague sprag...@gmail.com wrote: oh. thinking some more about this i forgot to ask some other basic questions. a) what storage format are you using for the table (text, sequence, rcfile, orc or custom)? show create table table would yield that. b) what command is causing the stack trace? my thinking here is rcfile and orc are column based (i think) and if you don't select all the columns that could very well limit the size of the row being returned and hence the size of the internal ArrayList. OTOH, if you're using select *, um, you have my sympathies. :) On Thu, Jan 30, 2014 at 11:33 AM, Stephen Sprague sprag...@gmail.com wrote: thanks for the information. Up-to-date hive. Cluster on the smallish side. And, well, sure looks like a memory issue. :) rather than an inherent hive limitation that is. So. I can only speak as a user (ie. not a hive developer) but what i'd be interested in knowing next is is this via running hive in local mode, correct? (eg. not through hiveserver1/2). And it looks like it boinks on array processing which i assume to be internal code arrays and not hive data arrays - your 15K columns are all scalar/simple types, correct? Its clearly fetching results and looks be trying to store them in a java array - and not just one row but a *set* of rows (ArrayList) two things to try. 1. boost the heap-size. try 8192. And I don't know if HADOOP_HEAPSIZE is the controller of that. I woulda hoped it was called something like HIVE_HEAPSIZE. :) Anyway, can't hurt to try. 2. trim down the number of columns and see where the breaking point is. is it 10K? is it 5K? The idea is to confirm its _the number of columns_ that is causing the memory to blow and not some other artifact unbeknownst to us. 3. Google around the Hive namespace for something that might limit or otherwise control the number of rows stored at once in Hive's internal buffer. I snoop around too. That's all i got for now and maybe we'll get lucky and someone on this list will know something or another about this. :) cheers, Stephen. On Thu, Jan 30, 2014 at 2:32 AM, David Gayou david.ga...@kxen.com wrote: We are using the Hive 0.12.0, but it doesn't work better on hive 0.11.0 or hive 0.10.0 Our hadoop version is 1.1.2. Our cluster
Re: Issue with Hive and table with lots of column
Final table compression should not effect the de serialized size of the data over the wire. On Fri, Jan 31, 2014 at 2:49 PM, Stephen Sprague sprag...@gmail.com wrote: Excellent progress David. So. What the most important thing here we learned was that it works (!) by running hive in local mode and that this error is a limitation in the HiveServer2. That's important. so textfile storage handler and having issues converting it to ORC. hmmm. follow-ups. 1. what is your query that fails? 2. can you add a limit 1 to the end of your query and tell us if that works? this'll tell us if it's column or row bound. 3. bonus points. run these in local mode: set hive.exec.compress.output=true; set mapred.output.compression.type=BLOCK; set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec; create table blah stored as ORC as select * from your table; #i'm curious if this'll work. show create table blah; #send output back if previous step worked. 4. extra bonus. change ORC to SEQUENCEFILE in #3 see if that works any differently. I'm wondering if compression would have any effect on the size of the internal ArrayList the thrift server uses. On Fri, Jan 31, 2014 at 9:21 AM, David Gayou david.ga...@kxen.com wrote: Ok, so here are some news : I tried to boost the HADOOP_HEAPSIZE to 8192, I also setted the mapred.child.java.opts to 512M And it doesn't seem's to have any effect. -- I tried it using an ODBC driver = fail after few minutes. Using a local JDBC (beeline) = running forever without any error. Both through hiveserver 2 If i use the local mode : it works! (but that not really what i need, as i don't really how to access it with my software) -- I use a text file as storage. I tried to use ORC, but i can't populate it with a load data (it return an error of file format). Using an ALTER TABLE orange_large_train_3 SET FILEFORMAT ORC after populating the table, i have a file format error on select. -- @Edward : I've tried to look around on how i can change the thrift heap size but haven't found anything. Same thing for my client (haven't found how to change the heap size) My usecase is really to have the most possible columns. Thanks a lot for your help Regards David On Fri, Jan 31, 2014 at 1:12 AM, Edward Capriolo edlinuxg...@gmail.comwrote: Ok here are the problem(s). Thrift has frame size limits, thrift has to buffer rows into memory. Hove thrift has a heap size, it needs to big in this case. Your client needs a big heap size as well. The way to do this query if it is possible may be turning row lateral, potwntially by treating it as a list, it will make queries on it awkward. Good luck On Thursday, January 30, 2014, Stephen Sprague sprag...@gmail.com wrote: oh. thinking some more about this i forgot to ask some other basic questions. a) what storage format are you using for the table (text, sequence, rcfile, orc or custom)? show create table table would yield that. b) what command is causing the stack trace? my thinking here is rcfile and orc are column based (i think) and if you don't select all the columns that could very well limit the size of the row being returned and hence the size of the internal ArrayList. OTOH, if you're using select *, um, you have my sympathies. :) On Thu, Jan 30, 2014 at 11:33 AM, Stephen Sprague sprag...@gmail.com wrote: thanks for the information. Up-to-date hive. Cluster on the smallish side. And, well, sure looks like a memory issue. :) rather than an inherent hive limitation that is. So. I can only speak as a user (ie. not a hive developer) but what i'd be interested in knowing next is is this via running hive in local mode, correct? (eg. not through hiveserver1/2). And it looks like it boinks on array processing which i assume to be internal code arrays and not hive data arrays - your 15K columns are all scalar/simple types, correct? Its clearly fetching results and looks be trying to store them in a java array - and not just one row but a *set* of rows (ArrayList) two things to try. 1. boost the heap-size. try 8192. And I don't know if HADOOP_HEAPSIZE is the controller of that. I woulda hoped it was called something like HIVE_HEAPSIZE. :) Anyway, can't hurt to try. 2. trim down the number of columns and see where the breaking point is. is it 10K? is it 5K? The idea is to confirm its _the number of columns_ that is causing the memory to blow and not some other artifact unbeknownst to us. 3. Google around the Hive namespace for something that might limit or otherwise control the number of rows stored at once in Hive's internal buffer. I snoop around too. That's all i got for now and maybe we'll get lucky and someone on this list will know something or another about this. :) cheers, Stephen. On Thu, Jan 30,
Re: Issue with Hive and table with lots of column
thanks Ed. And on a separate tact lets look at Hiveserver2. @OP *I've tried to look around on how i can change the thrift heap size but haven't found anything.* looking at my hiveserver2 i find this: $ ps -ef | grep -i hiveserver2 dwr 9824 20479 0 12:11 pts/100:00:00 grep -i hiveserver2 dwr 28410 1 0 00:05 ?00:01:04 /usr/lib/jvm/java-6-sun/jre/bin/java *-Xmx256m*-Dhadoop.log.dir=/usr/lib/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console -Djava.library.path=/usr/lib/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /usr/lib/hive/lib/hive-service-0.12.0.jar org.apache.hive.service.server.HiveServer2 questions: 1. what is the output of ps -ef | grep -i hiveserver2 on your system? in particular what is the value of -Xmx ? 2. can you restart your hiveserver with -Xmx1g? or some value that makes sense to your system? Lots of questions now. we await your answers! :) On Fri, Jan 31, 2014 at 11:51 AM, Edward Capriolo edlinuxg...@gmail.comwrote: Final table compression should not effect the de serialized size of the data over the wire. On Fri, Jan 31, 2014 at 2:49 PM, Stephen Sprague sprag...@gmail.comwrote: Excellent progress David. So. What the most important thing here we learned was that it works (!) by running hive in local mode and that this error is a limitation in the HiveServer2. That's important. so textfile storage handler and having issues converting it to ORC. hmmm. follow-ups. 1. what is your query that fails? 2. can you add a limit 1 to the end of your query and tell us if that works? this'll tell us if it's column or row bound. 3. bonus points. run these in local mode: set hive.exec.compress.output=true; set mapred.output.compression.type=BLOCK; set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec; create table blah stored as ORC as select * from your table; #i'm curious if this'll work. show create table blah; #send output back if previous step worked. 4. extra bonus. change ORC to SEQUENCEFILE in #3 see if that works any differently. I'm wondering if compression would have any effect on the size of the internal ArrayList the thrift server uses. On Fri, Jan 31, 2014 at 9:21 AM, David Gayou david.ga...@kxen.comwrote: Ok, so here are some news : I tried to boost the HADOOP_HEAPSIZE to 8192, I also setted the mapred.child.java.opts to 512M And it doesn't seem's to have any effect. -- I tried it using an ODBC driver = fail after few minutes. Using a local JDBC (beeline) = running forever without any error. Both through hiveserver 2 If i use the local mode : it works! (but that not really what i need, as i don't really how to access it with my software) -- I use a text file as storage. I tried to use ORC, but i can't populate it with a load data (it return an error of file format). Using an ALTER TABLE orange_large_train_3 SET FILEFORMAT ORC after populating the table, i have a file format error on select. -- @Edward : I've tried to look around on how i can change the thrift heap size but haven't found anything. Same thing for my client (haven't found how to change the heap size) My usecase is really to have the most possible columns. Thanks a lot for your help Regards David On Fri, Jan 31, 2014 at 1:12 AM, Edward Capriolo edlinuxg...@gmail.comwrote: Ok here are the problem(s). Thrift has frame size limits, thrift has to buffer rows into memory. Hove thrift has a heap size, it needs to big in this case. Your client needs a big heap size as well. The way to do this query if it is possible may be turning row lateral, potwntially by treating it as a list, it will make queries on it awkward. Good luck On Thursday, January 30, 2014, Stephen Sprague sprag...@gmail.com wrote: oh. thinking some more about this i forgot to ask some other basic questions. a) what storage format are you using for the table (text, sequence, rcfile, orc or custom)? show create table table would yield that. b) what command is causing the stack trace? my thinking here is rcfile and orc are column based (i think) and if you don't select all the columns that could very well limit the size of the row being returned and hence the size of the internal ArrayList. OTOH, if you're using select *, um, you have my sympathies. :) On Thu, Jan 30, 2014 at 11:33 AM, Stephen Sprague sprag...@gmail.com wrote: thanks for the information. Up-to-date hive. Cluster on the smallish side. And, well, sure looks like a memory issue. :) rather than an inherent hive limitation that is. So. I can only speak as a user (ie. not a hive developer) but what i'd be interested in knowing next
Re: Issue with Hive and table with lots of column
We are using the Hive 0.12.0, but it doesn't work better on hive 0.11.0 or hive 0.10.0 Our hadoop version is 1.1.2. Our cluster is 1 master + 4 slaves with 1 dual core xeon CPU (with hyperthreading so 4 cores per machine) + 16Gb Ram each The error message i get is : 2014-01-29 12:41:09,086 ERROR thrift.ProcessFunction (ProcessFunction.java:process(41)) - Internal error processing FetchResults java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2734) at java.util.ArrayList.ensureCapacity(ArrayList.java:167) at java.util.ArrayList.add(ArrayList.java:351) at org.apache.hive.service.cli.Row.init(Row.java:47) at org.apache.hive.service.cli.RowSet.addRow(RowSet.java:61) at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:235) at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:170) at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:417) at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:306) at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:386) at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1373) at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1358) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:58) at org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:55) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:526) at org.apache.hive.service.auth.TUGIContainingProcessor.process(TUGIContainingProcessor.java:55) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) My HADOOP_HEAPSIZE is setted to 4096 in hive-env.sh We are doing some machine learning on row by row basis on those dataset, so basically the more column we have the better it is. We are coming from the SQL world, and Hive is the closest to SQL syntax. We'd like to keep some SQL manipulation on the data. Thanks for the Help, Regards, David Gayou On Tue, Jan 28, 2014 at 8:35 PM, Stephen Sprague sprag...@gmail.com wrote: there's always a use case out there that stretches the imagination isn't there? gotta love it. first things first. can you share the error message? the hive version? and the number of nodes in your cluster? then a couple of things come to my mind. Might you consider pivoting the data such that you represent one row of 15K columns as 15K rows as, say, 3 columns (id, column_name, column_value) before you even load it into hive? the other thing is when i hear 15K columns the first thing i think is HBase (their motto is millions of columns and billions of rows) Anyway, lets see what you got for the first question! :) cheers, Stephen. On Tue, Jan 28, 2014 at 3:20 AM, David Gayou david.ga...@kxen.com wrote: Hello, I'm trying to test Hive with Tables including quite a lot of Columns. We are using the data from the KDD Cup 2009 based on anonymised real case dataset. http://www.sigkdd.org/kdd-cup-2009-customer-relationship-prediction The aim is to be able to create and manipulate a table with 15,000 columns. We were actually able to create the table and to load data inside it. You can find the create statement inside the attached file. The data file is pretty big, but i can share it if anyone want it. The statement SELECT * FROM orange_large_train_3 LIMIT 1000 is working fine, But the SELECT * FROM orange_large_train_3 doesn't work. We have tried several options for creating tables including creating the table using the ColumnarSerde row format, but couldn't make it works. Does any of you have any server configuration or storage to use when creating table in order to make it works with such a number of columns ? Regards, David Gayou
Re: Issue with Hive and table with lots of column
thanks for the information. Up-to-date hive. Cluster on the smallish side. And, well, sure looks like a memory issue. :) rather than an inherent hive limitation that is. So. I can only speak as a user (ie. not a hive developer) but what i'd be interested in knowing next is is this via running hive in local mode, correct? (eg. not through hiveserver1/2). And it looks like it boinks on array processing which i assume to be internal code arrays and not hive data arrays - your 15K columns are all scalar/simple types, correct? Its clearly fetching results and looks be trying to store them in a java array - and not just one row but a *set* of rows (ArrayList) two things to try. 1. boost the heap-size. try 8192. And I don't know if HADOOP_HEAPSIZE is the controller of that. I woulda hoped it was called something like HIVE_HEAPSIZE. :) Anyway, can't hurt to try. 2. trim down the number of columns and see where the breaking point is. is it 10K? is it 5K? The idea is to confirm its _the number of columns_ that is causing the memory to blow and not some other artifact unbeknownst to us. 3. Google around the Hive namespace for something that might limit or otherwise control the number of rows stored at once in Hive's internal buffer. I snoop around too. That's all i got for now and maybe we'll get lucky and someone on this list will know something or another about this. :) cheers, Stephen. On Thu, Jan 30, 2014 at 2:32 AM, David Gayou david.ga...@kxen.com wrote: We are using the Hive 0.12.0, but it doesn't work better on hive 0.11.0 or hive 0.10.0 Our hadoop version is 1.1.2. Our cluster is 1 master + 4 slaves with 1 dual core xeon CPU (with hyperthreading so 4 cores per machine) + 16Gb Ram each The error message i get is : 2014-01-29 12:41:09,086 ERROR thrift.ProcessFunction (ProcessFunction.java:process(41)) - Internal error processing FetchResults java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2734) at java.util.ArrayList.ensureCapacity(ArrayList.java:167) at java.util.ArrayList.add(ArrayList.java:351) at org.apache.hive.service.cli.Row.init(Row.java:47) at org.apache.hive.service.cli.RowSet.addRow(RowSet.java:61) at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:235) at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:170) at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:417) at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:306) at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:386) at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1373) at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1358) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:58) at org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:55) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:526) at org.apache.hive.service.auth.TUGIContainingProcessor.process(TUGIContainingProcessor.java:55) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) My HADOOP_HEAPSIZE is setted to 4096 in hive-env.sh We are doing some machine learning on row by row basis on those dataset, so basically the more column we have the better it is. We are coming from the SQL world, and Hive is the closest to SQL syntax. We'd like to keep some SQL manipulation on the data. Thanks for the Help, Regards, David Gayou On Tue, Jan 28, 2014 at 8:35 PM, Stephen Sprague sprag...@gmail.comwrote: there's always a use case out there that stretches the imagination isn't there? gotta love it. first things first. can you share the error message? the hive version? and the number of nodes in your cluster? then a couple of things come to my mind. Might you consider pivoting the data such that you represent one row of 15K columns as 15K rows as, say, 3 columns (id, column_name, column_value) before
Re: Issue with Hive and table with lots of column
oh. thinking some more about this i forgot to ask some other basic questions. a) what storage format are you using for the table (text, sequence, rcfile, orc or custom)? show create table table would yield that. b) what command is causing the stack trace? my thinking here is rcfile and orc are column based (i think) and if you don't select all the columns that could very well limit the size of the row being returned and hence the size of the internal ArrayList. OTOH, if you're using select *, um, you have my sympathies. :) On Thu, Jan 30, 2014 at 11:33 AM, Stephen Sprague sprag...@gmail.comwrote: thanks for the information. Up-to-date hive. Cluster on the smallish side. And, well, sure looks like a memory issue. :) rather than an inherent hive limitation that is. So. I can only speak as a user (ie. not a hive developer) but what i'd be interested in knowing next is is this via running hive in local mode, correct? (eg. not through hiveserver1/2). And it looks like it boinks on array processing which i assume to be internal code arrays and not hive data arrays - your 15K columns are all scalar/simple types, correct? Its clearly fetching results and looks be trying to store them in a java array - and not just one row but a *set* of rows (ArrayList) two things to try. 1. boost the heap-size. try 8192. And I don't know if HADOOP_HEAPSIZE is the controller of that. I woulda hoped it was called something like HIVE_HEAPSIZE. :) Anyway, can't hurt to try. 2. trim down the number of columns and see where the breaking point is. is it 10K? is it 5K? The idea is to confirm its _the number of columns_ that is causing the memory to blow and not some other artifact unbeknownst to us. 3. Google around the Hive namespace for something that might limit or otherwise control the number of rows stored at once in Hive's internal buffer. I snoop around too. That's all i got for now and maybe we'll get lucky and someone on this list will know something or another about this. :) cheers, Stephen. On Thu, Jan 30, 2014 at 2:32 AM, David Gayou david.ga...@kxen.com wrote: We are using the Hive 0.12.0, but it doesn't work better on hive 0.11.0 or hive 0.10.0 Our hadoop version is 1.1.2. Our cluster is 1 master + 4 slaves with 1 dual core xeon CPU (with hyperthreading so 4 cores per machine) + 16Gb Ram each The error message i get is : 2014-01-29 12:41:09,086 ERROR thrift.ProcessFunction (ProcessFunction.java:process(41)) - Internal error processing FetchResults java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2734) at java.util.ArrayList.ensureCapacity(ArrayList.java:167) at java.util.ArrayList.add(ArrayList.java:351) at org.apache.hive.service.cli.Row.init(Row.java:47) at org.apache.hive.service.cli.RowSet.addRow(RowSet.java:61) at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:235) at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:170) at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:417) at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:306) at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:386) at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1373) at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1358) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:58) at org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:55) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:526) at org.apache.hive.service.auth.TUGIContainingProcessor.process(TUGIContainingProcessor.java:55) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) My HADOOP_HEAPSIZE is setted to 4096 in hive-env.sh We are doing some machine learning on row by row basis on those dataset, so basically the more column we have the better it is. We are coming from the SQL
Re: Issue with Hive and table with lots of column
Ok here are the problem(s). Thrift has frame size limits, thrift has to buffer rows into memory. Hove thrift has a heap size, it needs to big in this case. Your client needs a big heap size as well. The way to do this query if it is possible may be turning row lateral, potwntially by treating it as a list, it will make queries on it awkward. Good luck On Thursday, January 30, 2014, Stephen Sprague sprag...@gmail.com wrote: oh. thinking some more about this i forgot to ask some other basic questions. a) what storage format are you using for the table (text, sequence, rcfile, orc or custom)? show create table table would yield that. b) what command is causing the stack trace? my thinking here is rcfile and orc are column based (i think) and if you don't select all the columns that could very well limit the size of the row being returned and hence the size of the internal ArrayList. OTOH, if you're using select *, um, you have my sympathies. :) On Thu, Jan 30, 2014 at 11:33 AM, Stephen Sprague sprag...@gmail.com wrote: thanks for the information. Up-to-date hive. Cluster on the smallish side. And, well, sure looks like a memory issue. :) rather than an inherent hive limitation that is. So. I can only speak as a user (ie. not a hive developer) but what i'd be interested in knowing next is is this via running hive in local mode, correct? (eg. not through hiveserver1/2). And it looks like it boinks on array processing which i assume to be internal code arrays and not hive data arrays - your 15K columns are all scalar/simple types, correct? Its clearly fetching results and looks be trying to store them in a java array - and not just one row but a *set* of rows (ArrayList) two things to try. 1. boost the heap-size. try 8192. And I don't know if HADOOP_HEAPSIZE is the controller of that. I woulda hoped it was called something like HIVE_HEAPSIZE. :) Anyway, can't hurt to try. 2. trim down the number of columns and see where the breaking point is. is it 10K? is it 5K? The idea is to confirm its _the number of columns_ that is causing the memory to blow and not some other artifact unbeknownst to us. 3. Google around the Hive namespace for something that might limit or otherwise control the number of rows stored at once in Hive's internal buffer. I snoop around too. That's all i got for now and maybe we'll get lucky and someone on this list will know something or another about this. :) cheers, Stephen. On Thu, Jan 30, 2014 at 2:32 AM, David Gayou david.ga...@kxen.com wrote: We are using the Hive 0.12.0, but it doesn't work better on hive 0.11.0 or hive 0.10.0 Our hadoop version is 1.1.2. Our cluster is 1 master + 4 slaves with 1 dual core xeon CPU (with hyperthreading so 4 cores per machine) + 16Gb Ram each The error message i get is : 2014-01-29 12:41:09,086 ERROR thrift.ProcessFunction (ProcessFunction.java:process(41)) - Internal error processing FetchResults java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2734) at java.util.ArrayList.ensureCapacity(ArrayList.java:167) at java.util.ArrayList.add(ArrayList.java:351) at org.apache.hive.service.cli.Row.init(Row.java:47) at org.apache.hive.service.cli.RowSet.addRow(RowSet.java:61) at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:235) at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:170) at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:417) at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:306) at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:386) at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1373) at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1358) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:58) at org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:55) at java.security.AccessCont -- Sorry this was sent from mobile. Will do less grammar and spell check than usual.