[jira] [Commented] (HIVE-6134) Merging small files based on file size only works for CTAS queries

2014-04-07 Thread Eric Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962335#comment-13962335
 ] 

Eric Chu commented on HIVE-6134:


Hi [~xuefuz] and [~ashutoshc], it turns out this issues not only affects Hue 
but also HIVE CLI - in that results won't show up in CLI until more than a 
minute has passed with timeout error for connection to nodes.

I'm trying to make the change myself in GenMRFileSink1.java to support a new 
property that when it's turned on, Hive will merge files for a regular (i.e., 
without mvTask), map-only job that uses more than X mappers (another property). 
I'm wondering if and how we could find out the number of mappers that will be 
used for that job when we are at that stage of the optimization. I want to set 
chDir to true when this number is greater than some threshold set via a new 
property.  I notice that currWork.getMapWork().getNumMapTasks() actually 
returns null. Can you give me some pointers?

> Merging small files based on file size only works for CTAS queries
> --
>
> Key: HIVE-6134
> URL: https://issues.apache.org/jira/browse/HIVE-6134
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.8.0, 0.10.0, 0.11.0, 0.12.0
>Reporter: Eric Chu
>
> According to the documentation, if we set hive.merge.mapfiles to true, Hive 
> will launch an additional MR job to merge the small output files at the end 
> of a map-only job when the average output file size is smaller than 
> hive.merge.smallfiles.avgsize. Similarly, by setting hive.merge.mapredfiles 
> to true, Hive will merge the output files of a map-reduce job. 
> My expectation is that this is true for all MR queries. However, my 
> observation is that this is only true for CTAS queries. In 
> GenMRFileSink1.java, HIVEMERGEMAPFILES and HIVEMERGEMAPREDFILES are only used 
> if ((ctx.getMvTask() != null) && (!ctx.getMvTask().isEmpty())). So, for a 
> regular SELECT query that doesn't have move tasks, these properties are not 
> used.
> Is my understanding correct and if so, what's the reasoning behind the logic 
> of not supporting this for regular SELECT queries? It seems to me that this 
> should be supported for regular SELECT queries as well. One scenario where 
> this hits us hard is when users try to download the result in HUE, and HUE 
> times out b/c there are thousands of output files. The workaround is to 
> re-run the query as CTAS, but it's a significant time sink.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-4975) Reading orc file throws exception after adding new column

2014-02-17 Thread Eric Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13903711#comment-13903711
 ] 

Eric Chu commented on HIVE-4975:


Hi [~kevinwilfong] do you have insights to this issue? 

> Reading orc file throws exception after adding new column
> -
>
> Key: HIVE-4975
> URL: https://issues.apache.org/jira/browse/HIVE-4975
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.11.0
> Environment: hive 0.11.0 hadoop 1.0.0
>Reporter: cyril liao
>Priority: Critical
>  Labels: orcfile
>
> ORC file read failure after add table column.
> create a table which have three column .(a string,b string,c string).
> add a new column after c by executing "ALTER TABLE table ADD COLUMNS (d 
> string)".
> execute hiveql "select d from table",the following exception goes:
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row [Error getting row data with 
> exception java.lang.ArrayIndexOutOfBoundsException: 4
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcStruct$OrcStructInspector.getStructFieldData(OrcStruct.java:206)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldData(UnionStructObjectInspector.java:128)
>   at 
> org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
>   at 
> org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
>   at 
> org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665)
>   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
>  ]
>   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:162)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row [Error getting row data with exception 
> java.lang.ArrayIndexOutOfBoundsException: 4
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcStruct$OrcStructInspector.getStructFieldData(OrcStruct.java:206)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldData(UnionStructObjectInspector.java:128)
>   at 
> org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
>   at 
> org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
>   at 
> org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665)
>   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
>  ]
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
>   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
>   ... 8 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating 
> d
>   at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOpe

[jira] [Commented] (HIVE-4703) Describe on a table returns "from deserializer" for column comments instead of values supplied in Create Table

2014-02-17 Thread Eric Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13903613#comment-13903613
 ] 

Eric Chu commented on HIVE-4703:


[~ottomata] We still see this issue in Hive 0.12

> Describe on a table returns "from deserializer" for column comments instead 
> of values supplied in Create Table
> --
>
> Key: HIVE-4703
> URL: https://issues.apache.org/jira/browse/HIVE-4703
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0
>Reporter: Eugene Koifman
> Attachments: webhcatMini.sh
>
>
> This causes Templeton e2e tests to fail.
> start WeHcat server (build/dist/hcatalog/sbin/webhcat_server.sh start)
> run commands in the attached webhcatMini.sh.
> It creates a table with some comments on columns.
> When executing describe (GET) on this table, original comments are lost and 
> are replaced with "from deserializer" string.
> Here is the output of these commands:
> localhost:dev ekoifman$ webhcatMini.sh
> Running delete test_table ifExists
> HTTP/1.1 200 OK
> Set-Cookie: 
> hadoop.auth="u=ekoifman&p=ekoifman&t=simple&e=1370945567179&s=vIBKhGQwzs5pPAY3IkhyPpDkWrY=";Version=1;Path=/;Discard
> Expires: Thu, 01 Jan 1970 00:00:00 GMT
> Content-Type: application/json
> Transfer-Encoding: chunked
> Server: Jetty(7.6.0.v20120127)
> {"table":"test_table","database":"default"}
> Running create test_table
> HTTP/1.1 200 OK
> Set-Cookie: 
> hadoop.auth="u=ekoifman&p=ekoifman&t=simple&e=1370945569788&s=g37NbyyRnf667IciUiIpIQNYGOo=";Version=1;Path=/;Discard
> Expires: Thu, 01 Jan 1970 00:00:00 GMT
> Content-Type: application/json
> Transfer-Encoding: chunked
> Server: Jetty(7.6.0.v20120127)
> {"table":"test_table","database":"default"}
> Running describe test_table
> HTTP/1.1 200 OK
> Set-Cookie: 
> hadoop.auth="u=ekoifman&p=ekoifman&t=simple&e=1370945572423&s=7kE1FOn1Co2JQzZfW0V1myqulw0=";Version=1;Path=/;Discard
> Expires: Thu, 01 Jan 1970 00:00:00 GMT
> Content-Type: application/json
> Transfer-Encoding: chunked
> Server: Jetty(7.6.0.v20120127)
> {"columns":[{"name":"int","comment":"from 
> deserializer","type":"string"},{"name":"int2","comment":"from 
> deserializer","type":"int"}],"database":"default","table":"test_table"}
> Mon Jun 10 17:12:55 PDT 2013



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-4975) Reading orc file throws exception after adding new column

2014-02-17 Thread Eric Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13903545#comment-13903545
 ] 

Eric Chu commented on HIVE-4975:


Also, the problem does not just happen when we do SELECT *, but any time we 
include the newly added columns in SELECT. Query runs fine when we only have 
old columns in SELECT clause.

> Reading orc file throws exception after adding new column
> -
>
> Key: HIVE-4975
> URL: https://issues.apache.org/jira/browse/HIVE-4975
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.11.0
> Environment: hive 0.11.0 hadoop 1.0.0
>Reporter: cyril liao
>Priority: Critical
>  Labels: orcfile
>
> ORC file read failure after add table column.
> create a table which have three column .(a string,b string,c string).
> add a new column after c by executing "ALTER TABLE table ADD COLUMNS (d 
> string)".
> execute hiveql "select d from table",the following exception goes:
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row [Error getting row data with 
> exception java.lang.ArrayIndexOutOfBoundsException: 4
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcStruct$OrcStructInspector.getStructFieldData(OrcStruct.java:206)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldData(UnionStructObjectInspector.java:128)
>   at 
> org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
>   at 
> org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
>   at 
> org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665)
>   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
>  ]
>   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:162)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row [Error getting row data with exception 
> java.lang.ArrayIndexOutOfBoundsException: 4
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcStruct$OrcStructInspector.getStructFieldData(OrcStruct.java:206)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldData(UnionStructObjectInspector.java:128)
>   at 
> org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
>   at 
> org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
>   at 
> org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665)
>   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
>  ]
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
>   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
>   ... 8 more
> Caused by: org.apache.hadoop.hive

[jira] [Commented] (HIVE-4975) Reading orc file throws exception after adding new column

2014-02-17 Thread Eric Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13903115#comment-13903115
 ] 

Eric Chu commented on HIVE-4975:


We've also run into this issue. [~owen.omalley] do you or your team have any 
insights on the root cause? I verified that the same problem didn't happen on 
rcfile, but instead null will be shown under the newly added column. 

> Reading orc file throws exception after adding new column
> -
>
> Key: HIVE-4975
> URL: https://issues.apache.org/jira/browse/HIVE-4975
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.11.0
> Environment: hive 0.11.0 hadoop 1.0.0
>Reporter: cyril liao
>Priority: Critical
>  Labels: orcfile
>
> ORC file read failure after add table column.
> create a table which have three column .(a string,b string,c string).
> add a new column after c by executing "ALTER TABLE table ADD COLUMNS (d 
> string)".
> execute hiveql "select d from table",the following exception goes:
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing row [Error getting row data with 
> exception java.lang.ArrayIndexOutOfBoundsException: 4
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcStruct$OrcStructInspector.getStructFieldData(OrcStruct.java:206)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldData(UnionStructObjectInspector.java:128)
>   at 
> org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
>   at 
> org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
>   at 
> org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665)
>   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
>  ]
>   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:162)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row [Error getting row data with exception 
> java.lang.ArrayIndexOutOfBoundsException: 4
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcStruct$OrcStructInspector.getStructFieldData(OrcStruct.java:206)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.UnionStructObjectInspector.getStructFieldData(UnionStructObjectInspector.java:128)
>   at 
> org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:371)
>   at 
> org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:236)
>   at 
> org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:222)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:665)
>   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
>  ]
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:671)
>   at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:144)
>   ... 

[jira] [Resolved] (HIVE-6210) Default serde for RCFile has changed

2014-01-15 Thread Eric Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Chu resolved HIVE-6210.


Resolution: Not A Problem

> Default serde for RCFile has changed
> 
>
> Key: HIVE-6210
> URL: https://issues.apache.org/jira/browse/HIVE-6210
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.12.0
>Reporter: Eric Chu
>
> In Hive 10 when I create a table in RCFile, the serde is 
> org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe
> In Hive 12 when I do the same thing, the serde becomes 
> org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe
> Similarly, in Hive 12, when I set FILEFORMAT to RCFILE, the serde will become 
> LazyBinaryColumnarSerDe, as opposed to ColumnarSerDe in previous versions. 
> What is the reason behind this change? This seems like a regression bug to me.
> Normally, we can work around the issue by explicitly setting the table serde 
> to be org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe. However, this 
> causes a problem for our migration to ORC. Specifically, we have a 
> partitioned table for which we want the new partitions to have locations 
> pointing to ORC partitions, and the old partitions to have locations pointing 
> to RCFILE partitions. Moreover, we need the ability to change the location of 
> a partition to point to RCFILE partition. For this we'd do so by doing SET 
> FILEFORMAT RCFILE. However, b/c of this serde problem the RCFile partition in 
> an ORC table will have the wrong serde, and ALTER TABLE doesn't allow us to 
> set serde for a partition. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6210) Default serde for RCFile has changed

2014-01-15 Thread Eric Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873022#comment-13873022
 ] 

Eric Chu commented on HIVE-6210:


Clarification:
1) I can set serde for partition. The error I got was due to not having quotes 
around the serde value
2) The change comes from HIVE-4475. 

So this is not a bug after all but actually a feature. I hope Hive release 
notes would mention breaking changes like this. Otherwise it's very hard to 
notice these things. For example, we experienced a correctness bug after 
upgrading to Hive 11 that forced us to revert to Hive 10 until the bug is 
fixed. And the next time before we upgraded to 12 we spent a month doing 
upgrade testing, but still we couldn't catch this. 

> Default serde for RCFile has changed
> 
>
> Key: HIVE-6210
> URL: https://issues.apache.org/jira/browse/HIVE-6210
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.12.0
>Reporter: Eric Chu
>
> In Hive 10 when I create a table in RCFile, the serde is 
> org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe
> In Hive 12 when I do the same thing, the serde becomes 
> org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe
> Similarly, in Hive 12, when I set FILEFORMAT to RCFILE, the serde will become 
> LazyBinaryColumnarSerDe, as opposed to ColumnarSerDe in previous versions. 
> What is the reason behind this change? This seems like a regression bug to me.
> Normally, we can work around the issue by explicitly setting the table serde 
> to be org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe. However, this 
> causes a problem for our migration to ORC. Specifically, we have a 
> partitioned table for which we want the new partitions to have locations 
> pointing to ORC partitions, and the old partitions to have locations pointing 
> to RCFILE partitions. Moreover, we need the ability to change the location of 
> a partition to point to RCFILE partition. For this we'd do so by doing SET 
> FILEFORMAT RCFILE. However, b/c of this serde problem the RCFile partition in 
> an ORC table will have the wrong serde, and ALTER TABLE doesn't allow us to 
> set serde for a partition. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HIVE-6210) Default serde for RCFile has changed

2014-01-15 Thread Eric Chu (JIRA)
Eric Chu created HIVE-6210:
--

 Summary: Default serde for RCFile has changed
 Key: HIVE-6210
 URL: https://issues.apache.org/jira/browse/HIVE-6210
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Eric Chu


In Hive 10 when I create a table in RCFile, the serde is 
org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe

In Hive 12 when I do the same thing, the serde becomes 
org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe

Similarly, in Hive 12, when I set FILEFORMAT to RCFILE, the serde will become 
LazyBinaryColumnarSerDe, as opposed to ColumnarSerDe in previous versions. What 
is the reason behind a change? This seems like a regression bug to me.

Normally, we can work around the issue by explicitly setting the table serde to 
be org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe. However, this causes a 
problem for our migration to ORC. Specifically, we have a partitioned table for 
which we want the new partitions to have locations pointing to ORC partitions, 
and the old partitions to have locations pointing to RCFILE partitions. 
Moreover, we need the ability to change the location of a partition to point to 
RCFILE partition. For this we'd do so by doing SET FILEFORMAT RCFILE. However, 
b/c of this serde problem the RCFile partition in an ORC table will have the 
wrong serde, and ALTER TABLE doesn't allow us to set serde for a partition. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6210) Default serde for RCFile has changed

2014-01-15 Thread Eric Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Chu updated HIVE-6210:
---

Description: 
In Hive 10 when I create a table in RCFile, the serde is 
org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe

In Hive 12 when I do the same thing, the serde becomes 
org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe

Similarly, in Hive 12, when I set FILEFORMAT to RCFILE, the serde will become 
LazyBinaryColumnarSerDe, as opposed to ColumnarSerDe in previous versions. What 
is the reason behind this change? This seems like a regression bug to me.

Normally, we can work around the issue by explicitly setting the table serde to 
be org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe. However, this causes a 
problem for our migration to ORC. Specifically, we have a partitioned table for 
which we want the new partitions to have locations pointing to ORC partitions, 
and the old partitions to have locations pointing to RCFILE partitions. 
Moreover, we need the ability to change the location of a partition to point to 
RCFILE partition. For this we'd do so by doing SET FILEFORMAT RCFILE. However, 
b/c of this serde problem the RCFile partition in an ORC table will have the 
wrong serde, and ALTER TABLE doesn't allow us to set serde for a partition. 

  was:
In Hive 10 when I create a table in RCFile, the serde is 
org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe

In Hive 12 when I do the same thing, the serde becomes 
org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe

Similarly, in Hive 12, when I set FILEFORMAT to RCFILE, the serde will become 
LazyBinaryColumnarSerDe, as opposed to ColumnarSerDe in previous versions. What 
is the reason behind a change? This seems like a regression bug to me.

Normally, we can work around the issue by explicitly setting the table serde to 
be org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe. However, this causes a 
problem for our migration to ORC. Specifically, we have a partitioned table for 
which we want the new partitions to have locations pointing to ORC partitions, 
and the old partitions to have locations pointing to RCFILE partitions. 
Moreover, we need the ability to change the location of a partition to point to 
RCFILE partition. For this we'd do so by doing SET FILEFORMAT RCFILE. However, 
b/c of this serde problem the RCFile partition in an ORC table will have the 
wrong serde, and ALTER TABLE doesn't allow us to set serde for a partition. 


> Default serde for RCFile has changed
> 
>
> Key: HIVE-6210
> URL: https://issues.apache.org/jira/browse/HIVE-6210
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.12.0
>Reporter: Eric Chu
>
> In Hive 10 when I create a table in RCFile, the serde is 
> org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe
> In Hive 12 when I do the same thing, the serde becomes 
> org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe
> Similarly, in Hive 12, when I set FILEFORMAT to RCFILE, the serde will become 
> LazyBinaryColumnarSerDe, as opposed to ColumnarSerDe in previous versions. 
> What is the reason behind this change? This seems like a regression bug to me.
> Normally, we can work around the issue by explicitly setting the table serde 
> to be org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe. However, this 
> causes a problem for our migration to ORC. Specifically, we have a 
> partitioned table for which we want the new partitions to have locations 
> pointing to ORC partitions, and the old partitions to have locations pointing 
> to RCFILE partitions. Moreover, we need the ability to change the location of 
> a partition to point to RCFILE partition. For this we'd do so by doing SET 
> FILEFORMAT RCFILE. However, b/c of this serde problem the RCFile partition in 
> an ORC table will have the wrong serde, and ALTER TABLE doesn't allow us to 
> set serde for a partition. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-5379) NoClassDefFoundError is thrown when using lead/lag with kryo serialization

2014-01-07 Thread Eric Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864487#comment-13864487
 ] 

Eric Chu commented on HIVE-5379:


Please ignore the above. False alarm. Sorry for the inconvenience.

> NoClassDefFoundError is thrown when using lead/lag with kryo serialization
> --
>
> Key: HIVE-5379
> URL: https://issues.apache.org/jira/browse/HIVE-5379
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Fix For: 0.13.0
>
> Attachments: D13155.1.patch
>
>
> {noformat}
> java.lang.RuntimeException: Error in configuring object
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:432)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
>   at org.apache.hadoop.mapred.Child.main(Child.java:260)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>   ... 9 more
> Caused by: java.lang.NoClassDefFoundError: 
> org/antlr/runtime/tree/TreeWizard$ContextVisitor
>   at java.lang.ClassLoader.defineClass1(Native Method)
>   at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
>   at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
>   at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
>   at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
>   at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>   at java.lang.Class.getDeclaringClass(Native Method)
>   at java.lang.Class.getEnclosingClass(Class.java:1085)
>   at com.esotericsoftware.kryo.Kryo.newInstantiator(Kryo.java:1054)
>   at com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1110)
>   at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.create(FieldSerializer.java:526)
>   at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:502)
>   at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
>   at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
>   at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:672)
>   at 
> org.apache.hadoop.hive.ql.exec.

[jira] [Commented] (HIVE-5379) NoClassDefFoundError is thrown when using lead/lag with kryo serialization

2014-01-07 Thread Eric Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864431#comment-13864431
 ] 

Eric Chu commented on HIVE-5379:


To continue from above, we applied this patch and the query above worked, but 
the query "SELECT sum(num) OVER (PARTITION BY name) FROM test_sum" would give 
the following exception:
Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.hive.ql.parse.PTFTranslator$LeadLagInfo
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at 
com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:136)
...

> NoClassDefFoundError is thrown when using lead/lag with kryo serialization
> --
>
> Key: HIVE-5379
> URL: https://issues.apache.org/jira/browse/HIVE-5379
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Fix For: 0.13.0
>
> Attachments: D13155.1.patch
>
>
> {noformat}
> java.lang.RuntimeException: Error in configuring object
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:432)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
>   at org.apache.hadoop.mapred.Child.main(Child.java:260)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>   ... 9 more
> Caused by: java.lang.NoClassDefFoundError: 
> org/antlr/runtime/tree/TreeWizard$ContextVisitor
>   at java.lang.ClassLoader.defineClass1(Native Method)
>   at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
>   at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
>   at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
>   at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
>   at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>   at java.lang.Class.getDeclaringClass(Native Method)
>   at java.lang.Class.getEnclosingClass(Class.java:1085)
>   at com.esotericsoftware.kryo.Kryo.newInstantiator(Kryo.java:1054)
>   at com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1110)
>   at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.create(FieldSerializer.java:526)
>   at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:502)
>   at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>  

[jira] [Commented] (HIVE-5379) NoClassDefFoundError is thrown when using lead/lag with kryo serialization

2014-01-07 Thread Eric Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864428#comment-13864428
 ] 

Eric Chu commented on HIVE-5379:


We had the same error when we ran:
SELECT count(num) OVER (PARTITION BY name) FROM test_sum;


> NoClassDefFoundError is thrown when using lead/lag with kryo serialization
> --
>
> Key: HIVE-5379
> URL: https://issues.apache.org/jira/browse/HIVE-5379
> Project: Hive
>  Issue Type: Bug
>  Components: PTF-Windowing
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Fix For: 0.13.0
>
> Attachments: D13155.1.patch
>
>
> {noformat}
> java.lang.RuntimeException: Error in configuring object
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>   at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:432)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
>   at org.apache.hadoop.mapred.Child.main(Child.java:260)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>   ... 9 more
> Caused by: java.lang.NoClassDefFoundError: 
> org/antlr/runtime/tree/TreeWizard$ContextVisitor
>   at java.lang.ClassLoader.defineClass1(Native Method)
>   at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
>   at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
>   at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
>   at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
>   at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>   at java.lang.Class.getDeclaringClass(Native Method)
>   at java.lang.Class.getEnclosingClass(Class.java:1085)
>   at com.esotericsoftware.kryo.Kryo.newInstantiator(Kryo.java:1054)
>   at com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1110)
>   at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.create(FieldSerializer.java:526)
>   at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:502)
>   at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
>   at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
>   at 
> com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
>   at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
>   at 
> com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
>   at 
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
>   at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:672)
>   at 
> org.

[jira] [Commented] (HIVE-6134) Merging small files based on file size only works for CTAS queries

2014-01-06 Thread Eric Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13863803#comment-13863803
 ] 

Eric Chu commented on HIVE-6134:


Good point [~xuefuz]. For the case of map-only jobs, we are considering to 
check that the following are true
1. hconf.getBoolVar(ConfVars.HIVEMERGEMAPFILES)
2. currWork.getReduceWork() == null
3. currWork.getMapWork().getNumMapTasks() > [threshold], where [threshold] 
could be some configurable value. We have observed that Hue starts to time out 
when number of output files > 2000.

As for Ashutosh's comment, we could increase the limit on the timeout but it 
won't completely solve the problem. We're talking about thousands to tens of 
thousands of output files from a single query. Even if it doesn't timeout, 
it'll take a noticeably long time to download the result, and UX will be 
horrible. 

> Merging small files based on file size only works for CTAS queries
> --
>
> Key: HIVE-6134
> URL: https://issues.apache.org/jira/browse/HIVE-6134
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.8.0, 0.10.0, 0.11.0, 0.12.0
>Reporter: Eric Chu
>
> According to the documentation, if we set hive.merge.mapfiles to true, Hive 
> will launch an additional MR job to merge the small output files at the end 
> of a map-only job when the average output file size is smaller than 
> hive.merge.smallfiles.avgsize. Similarly, by setting hive.merge.mapredfiles 
> to true, Hive will merge the output files of a map-reduce job. 
> My expectation is that this is true for all MR queries. However, my 
> observation is that this is only true for CTAS queries. In 
> GenMRFileSink1.java, HIVEMERGEMAPFILES and HIVEMERGEMAPREDFILES are only used 
> if ((ctx.getMvTask() != null) && (!ctx.getMvTask().isEmpty())). So, for a 
> regular SELECT query that doesn't have move tasks, these properties are not 
> used.
> Is my understanding correct and if so, what's the reasoning behind the logic 
> of not supporting this for regular SELECT queries? It seems to me that this 
> should be supported for regular SELECT queries as well. One scenario where 
> this hits us hard is when users try to download the result in HUE, and HUE 
> times out b/c there are thousands of output files. The workaround is to 
> re-run the query as CTAS, but it's a significant time sink.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6134) Merging small files based on file size only works for CTAS queries

2014-01-06 Thread Eric Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13863512#comment-13863512
 ] 

Eric Chu commented on HIVE-6134:


[~xuefuz] Regarding when to merge, isn't that precisely what 
hive.merge.mapfiles, hive.merge.mapredfiles, and hive.merge.smallfiles.avgsize 
are for? I never propose to merge files for every query. Rather, I'm proposing 
to honor these properties for queries without move tasks as well. Users won't 
need to decide when to merge; it'd be decided based on the configuration and 
the avg output file size, just like queries that result in a new table.


> Merging small files based on file size only works for CTAS queries
> --
>
> Key: HIVE-6134
> URL: https://issues.apache.org/jira/browse/HIVE-6134
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.8.0, 0.10.0, 0.11.0, 0.12.0
>Reporter: Eric Chu
>
> According to the documentation, if we set hive.merge.mapfiles to true, Hive 
> will launch an additional MR job to merge the small output files at the end 
> of a map-only job when the average output file size is smaller than 
> hive.merge.smallfiles.avgsize. Similarly, by setting hive.merge.mapredfiles 
> to true, Hive will merge the output files of a map-reduce job. 
> My expectation is that this is true for all MR queries. However, my 
> observation is that this is only true for CTAS queries. In 
> GenMRFileSink1.java, HIVEMERGEMAPFILES and HIVEMERGEMAPREDFILES are only used 
> if ((ctx.getMvTask() != null) && (!ctx.getMvTask().isEmpty())). So, for a 
> regular SELECT query that doesn't have move tasks, these properties are not 
> used.
> Is my understanding correct and if so, what's the reasoning behind the logic 
> of not supporting this for regular SELECT queries? It seems to me that this 
> should be supported for regular SELECT queries as well. One scenario where 
> this hits us hard is when users try to download the result in HUE, and HUE 
> times out b/c there are thousands of output files. The workaround is to 
> re-run the query as CTAS, but it's a significant time sink.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6134) Merging small files based on file size only works for CTAS queries

2014-01-06 Thread Eric Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13863292#comment-13863292
 ] 

Eric Chu commented on HIVE-6134:


Thanks [~ashutoshc] for pointing out the concatenate command. However, I think 
the ability to merge files for a table partition is orthogonal to supporting 
hive.merge.mapfiles, hive.merge.mapredfiles, and hive.merge.smallfiles.avgsize 
for "regular" queries (i.e., that don't result in a new table). Even if we have 
the optimal number of files at input for each partition, users querying over a 
large number of partitions with just SELECT FROM WHERE clauses will result in a 
large number of small output files, and there will be negative sides effects 
such as Hue timeout, the next job will have a large number of mappers, etc.

Can someone explain why the properties are supported only for queries with move 
tasks? Was it just a matter of scoping, or is there some reason that makes this 
inappropriate for queries without a move task? We are considering adding this 
support on our own and would like to get some insights on the original design 
considerations. Thanks!



> Merging small files based on file size only works for CTAS queries
> --
>
> Key: HIVE-6134
> URL: https://issues.apache.org/jira/browse/HIVE-6134
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.8.0, 0.10.0, 0.11.0, 0.12.0
>Reporter: Eric Chu
>
> According to the documentation, if we set hive.merge.mapfiles to true, Hive 
> will launch an additional MR job to merge the small output files at the end 
> of a map-only job when the average output file size is smaller than 
> hive.merge.smallfiles.avgsize. Similarly, by setting hive.merge.mapredfiles 
> to true, Hive will merge the output files of a map-reduce job. 
> My expectation is that this is true for all MR queries. However, my 
> observation is that this is only true for CTAS queries. In 
> GenMRFileSink1.java, HIVEMERGEMAPFILES and HIVEMERGEMAPREDFILES are only used 
> if ((ctx.getMvTask() != null) && (!ctx.getMvTask().isEmpty())). So, for a 
> regular SELECT query that doesn't have move tasks, these properties are not 
> used.
> Is my understanding correct and if so, what's the reasoning behind the logic 
> of not supporting this for regular SELECT queries? It seems to me that this 
> should be supported for regular SELECT queries as well. One scenario where 
> this hits us hard is when users try to download the result in HUE, and HUE 
> times out b/c there are thousands of output files. The workaround is to 
> re-run the query as CTAS, but it's a significant time sink.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6134) Merging small files based on file size only works for CTAS queries

2014-01-04 Thread Eric Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862462#comment-13862462
 ] 

Eric Chu commented on HIVE-6134:


[~xuefu.w...@kodak.com] We notice that the problem occurs when a query results 
in too many files; however, this happens b/c the table has too many (but not 
necessarily small) files. Most of the queries that have this problem are 
regular SELECT FROM WHERE queries (no GROUP BY) that don't have reducers. Some 
of our tables have hundreds of GBs per partition; the biggest one has TBs of 
data per partition. It's not uncommon to see queries with thousands or tens of 
thousands of mappers, but no reducers. 

We are looking at other ways to mitigate this problem. What you suggest - 
merging files in a partition - is certainly something we are considering. 
Meanwhile, I want to consider supporting these properties for queries without a 
move task. Specifically, what are the reasons that we didn't support these 
properties for queries without a move tasks? And if we want to do do, what 
considerations should we make? We'd be willing to work on this, but we probably 
will need some guidance from domain experts. Thanks!

> Merging small files based on file size only works for CTAS queries
> --
>
> Key: HIVE-6134
> URL: https://issues.apache.org/jira/browse/HIVE-6134
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.8.0, 0.10.0, 0.11.0, 0.12.0
>Reporter: Eric Chu
>
> According to the documentation, if we set hive.merge.mapfiles to true, Hive 
> will launch an additional MR job to merge the small output files at the end 
> of a map-only job when the average output file size is smaller than 
> hive.merge.smallfiles.avgsize. Similarly, by setting hive.merge.mapredfiles 
> to true, Hive will merge the output files of a map-reduce job. 
> My expectation is that this is true for all MR queries. However, my 
> observation is that this is only true for CTAS queries. In 
> GenMRFileSink1.java, HIVEMERGEMAPFILES and HIVEMERGEMAPREDFILES are only used 
> if ((ctx.getMvTask() != null) && (!ctx.getMvTask().isEmpty())). So, for a 
> regular SELECT query that doesn't have move tasks, these properties are not 
> used.
> Is my understanding correct and if so, what's the reasoning behind the logic 
> of not supporting this for regular SELECT queries? It seems to me that this 
> should be supported for regular SELECT queries as well. One scenario where 
> this hits us hard is when users try to download the result in HUE, and HUE 
> times out b/c there are thousands of output files. The workaround is to 
> re-run the query as CTAS, but it's a significant time sink.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6134) Merging small files based on file size only works for CTAS queries

2014-01-03 Thread Eric Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861870#comment-13861870
 ] 

Eric Chu commented on HIVE-6134:


Thanks Xuefu for the quick response! A few questions/comments:
1. Could you elaborate on why you think it makes sense to only merge small 
files for queries resulting in a new table? Alternatively, what are the issues 
for supporting these properties for regular queries? I'd love to have this 
support for regular queries, unless there's a strong reason against it.
2. If indeed these properties are designed only for queries resulting in a new 
table, then we should mention that in the documentation. Currently it's 
misleading - it sounds like they'd work for regular queries as well.
3. The main pain point here is that users won't know that there are many output 
files until AFTER the query is run. Imagine analysts who don't know these 
details and HUE is the only query interface for them. It's frustrating and time 
consuming to run a long-running query in Hue, only to find out they can't get 
the results b/c HUE times out trying to read these many small files, and so 
they'll have to run the query again as CTAS. Having a table just so they could 
download the result seems to be an overkill.
4. Do you have a suggestion for the aforementioned HUE issue? Hue starts timing 
out when the query results in thousands of small output files. This is a major 
pain point for our analysts today.

> Merging small files based on file size only works for CTAS queries
> --
>
> Key: HIVE-6134
> URL: https://issues.apache.org/jira/browse/HIVE-6134
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.8.0, 0.10.0, 0.11.0, 0.12.0
>Reporter: Eric Chu
>
> According to the documentation, if we set hive.merge.mapfiles to true, Hive 
> will launch an additional MR job to merge the small output files at the end 
> of a map-only job when the average output file size is smaller than 
> hive.merge.smallfiles.avgsize. Similarly, by setting hive.merge.mapredfiles 
> to true, Hive will merge the output files of a map-reduce job. 
> My expectation is that this is true for all MR queries. However, my 
> observation is that this is only true for CTAS queries. In 
> GenMRFileSink1.java, HIVEMERGEMAPFILES and HIVEMERGEMAPREDFILES are only used 
> if ((ctx.getMvTask() != null) && (!ctx.getMvTask().isEmpty())). So, for a 
> regular SELECT query that doesn't have move tasks, these properties are not 
> used.
> Is my understanding correct and if so, what's the reasoning behind the logic 
> of not supporting this for regular SELECT queries? It seems to me that this 
> should be supported for regular SELECT queries as well. One scenario where 
> this hits us hard is when users try to download the result in HUE, and HUE 
> times out b/c there are thousands of output files. The workaround is to 
> re-run the query as CTAS, but it's a significant time sink.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6134) Merging small files based on file size only works for CTAS queries

2014-01-03 Thread Eric Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861818#comment-13861818
 ] 

Eric Chu commented on HIVE-6134:


[~brocknoland] and [~xuefuz]: I was talking to Yin Huai about this issue and he 
suggested I pinged you on this, especially on how it affects HUE UX as 
mentioned above. 

> Merging small files based on file size only works for CTAS queries
> --
>
> Key: HIVE-6134
> URL: https://issues.apache.org/jira/browse/HIVE-6134
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.8.0, 0.10.0, 0.11.0, 0.12.0
>Reporter: Eric Chu
>
> According to the documentation, if we set hive.merge.mapfiles to true, Hive 
> will launch an additional MR job to merge the small output files at the end 
> of a map-only job when the average output file size is smaller than 
> hive.merge.smallfiles.avgsize. Similarly, by setting hive.merge.mapredfiles 
> to true, Hive will merge the output files of a map-reduce job. 
> My expectation is that this is true for all MR queries. However, my 
> observation is that this is only true for CTAS queries. In 
> GenMRFileSink1.java, HIVEMERGEMAPFILES and HIVEMERGEMAPREDFILES are only used 
> if ((ctx.getMvTask() != null) && (!ctx.getMvTask().isEmpty())). So, for a 
> regular SELECT query that doesn't have move tasks, these properties are not 
> used.
> Is my understanding correct and if so, what's the reasoning behind the logic 
> of not supporting this for regular SELECT queries? It seems to me that this 
> should be supported for regular SELECT queries as well. One scenario where 
> this hits us hard is when users try to download the result in HUE, and HUE 
> times out b/c there are thousands of output files. The workaround is to 
> re-run the query as CTAS, but it's a significant time sink.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HIVE-6134) Merging small files based on file size only works for CTAS queries

2014-01-03 Thread Eric Chu (JIRA)
Eric Chu created HIVE-6134:
--

 Summary: Merging small files based on file size only works for 
CTAS queries
 Key: HIVE-6134
 URL: https://issues.apache.org/jira/browse/HIVE-6134
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, 0.11.0, 0.10.0, 0.8.0
Reporter: Eric Chu


According to the documentation, if we set hive.merge.mapfiles to true, Hive 
will launch an additional MR job to merge the small output files at the end of 
a map-only job when the average output file size is smaller than 
hive.merge.smallfiles.avgsize. Similarly, by setting hive.merge.mapredfiles to 
true, Hive will merge the output files of a map-reduce job. 

My expectation is that this is true for all MR queries. However, my observation 
is that this is only true for CTAS queries. In GenMRFileSink1.java, 
HIVEMERGEMAPFILES and HIVEMERGEMAPREDFILES are only used if ((ctx.getMvTask() 
!= null) && (!ctx.getMvTask().isEmpty())). So, for a regular SELECT query that 
doesn't have move tasks, these properties are not used.

Is my understanding correct and if so, what's the reasoning behind the logic of 
not supporting this for regular SELECT queries? It seems to me that this should 
be supported for regular SELECT queries as well. One scenario where this hits 
us hard is when users try to download the result in HUE, and HUE times out b/c 
there are thousands of output files. The workaround is to re-run the query as 
CTAS, but it's a significant time sink.





--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-5970) ArrayIndexOutOfBoundsException in RunLengthIntegerReaderV2.java

2013-12-10 Thread Eric Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845159#comment-13845159
 ] 

Eric Chu commented on HIVE-5970:


If we have already started using RLE 0.11 for some partitions of the table and 
want to switch to RLEv2, would we need to rewrite existing partitions in RLEv2? 
If not, would we need to set hive.exec.orc.write.format=0.11 when we query 
partitions with the old RLE, and set hive.exec.orc.write.format=null when we 
query partitions with RLEv2?

> ArrayIndexOutOfBoundsException in RunLengthIntegerReaderV2.java
> ---
>
> Key: HIVE-5970
> URL: https://issues.apache.org/jira/browse/HIVE-5970
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.12.0
>Reporter: Eric Chu
>Priority: Critical
>  Labels: orcfile
> Attachments: test_data
>
>
> A workload involving ORC tables starts getting the following 
> ArrayIndexOutOfBoundsException AFTER the upgrade to Hive 0.12. The file is 
> added as part of HIVE-4123. 
> 2013-12-04 14:42:08,537 ERROR 
> cause:java.io.IOException: java.io.IOException: 
> java.lang.ArrayIndexOutOfBoundsException: 0
> 2013-12-04 14:42:08,537 WARN org.apache.hadoop.mapred.Child: Error running 
> child
> java.io.IOException: java.io.IOException: 
> java.lang.ArrayIndexOutOfBoundsException: 0
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:304)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:220)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:215)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:200)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> at org.apache.hadoop.mapred.Child.main(Child.java:262)
> Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 0
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:276)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:108)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:302)
> ... 11 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
> at 
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readPatchedBaseValues(RunLengthIntegerReaderV2.java:171)
> at 
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:54)
> at 
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:287)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$LongTreeReader.next(RecordReaderImpl.java:473)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1157)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2196)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:129)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:80)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274)
> ... 15 more



--
This message was sent by Atlassian JI

[jira] [Created] (HIVE-6005) BETWEEN is broken after using KRYO

2013-12-10 Thread Eric Chu (JIRA)
Eric Chu created HIVE-6005:
--

 Summary: BETWEEN is broken after using KRYO
 Key: HIVE-6005
 URL: https://issues.apache.org/jira/browse/HIVE-6005
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Eric Chu


After taking in HIVE-1511, HIVE-5422, and HIVE-5257 on top of Hive 0.12 to use 
Kryo, queries with BETWEEN start to fail with the following exception:

com.esotericsoftware.kryo.KryoException: Class cannot be created (missing 
no-arg constructor): 
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableConstantBooleanObjectInspector
Serialization trace:
argumentOIs (org.apache.hadoop.hive.ql.udf.generic.GenericUDFBetween)
genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
filters (org.apache.hadoop.hive.ql.plan.JoinDesc)
conf (org.apache.hadoop.hive.ql.exec.JoinOperator)
reducer (org.apache.hadoop.hive.ql.plan.ReduceWork)
at com.esotericsoftware.kryo.Kryo.newInstantiator(Kryo.java:1097)
at com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1109)
at 
com.esotericsoftware.kryo.serializers.FieldSerializer.create(FieldSerializer.java:526)
...

A workaround is to replace BETWEEN with >= and <=, but I think this failure is 
a bug and not by design. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HIVE-5970) ArrayIndexOutOfBoundsException in RunLengthIntegerReaderV2.java

2013-12-05 Thread Eric Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13840743#comment-13840743
 ] 

Eric Chu commented on HIVE-5970:


Or pl could be 0

> ArrayIndexOutOfBoundsException in RunLengthIntegerReaderV2.java
> ---
>
> Key: HIVE-5970
> URL: https://issues.apache.org/jira/browse/HIVE-5970
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.12.0
>Reporter: Eric Chu
>Priority: Critical
>  Labels: orcfile
>
> A workload involving ORC tables starts getting the following 
> ArrayIndexOutOfBoundsException AFTER the upgrade to Hive 0.12. The file is 
> added as part of HIVE-4123. 
> 2013-12-04 14:42:08,537 ERROR 
> cause:java.io.IOException: java.io.IOException: 
> java.lang.ArrayIndexOutOfBoundsException: 0
> 2013-12-04 14:42:08,537 WARN org.apache.hadoop.mapred.Child: Error running 
> child
> java.io.IOException: java.io.IOException: 
> java.lang.ArrayIndexOutOfBoundsException: 0
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:304)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:220)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:215)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:200)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> at org.apache.hadoop.mapred.Child.main(Child.java:262)
> Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 0
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:276)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:108)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:302)
> ... 11 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
> at 
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readPatchedBaseValues(RunLengthIntegerReaderV2.java:171)
> at 
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:54)
> at 
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:287)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$LongTreeReader.next(RecordReaderImpl.java:473)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1157)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2196)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:129)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:80)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274)
> ... 15 more



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5970) ArrayIndexOutOfBoundsException in RunLengthIntegerReaderV2.java

2013-12-05 Thread Eric Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13840730#comment-13840730
 ] 

Eric Chu commented on HIVE-5970:


Looking at the code where the exception occurs, It seems like the array 
unpackedPatch could be empty after SerializationUtils.readInts(), but there's 
no checking when accessing the array.

   // unpack the patch blob
long[] unpackedPatch = new long[pl];
SerializationUtils.readInts(unpackedPatch, 0, pl, pw + pgw, input);

// apply the patch directly when decoding the packed data
int patchIdx = 0;
long currGap = 0;
long currPatch = 0;
currGap = unpackedPatch[patchIdx] >>> pw;




> ArrayIndexOutOfBoundsException in RunLengthIntegerReaderV2.java
> ---
>
> Key: HIVE-5970
> URL: https://issues.apache.org/jira/browse/HIVE-5970
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.12.0
>Reporter: Eric Chu
>Priority: Critical
>  Labels: orcfile
>
> A workload involving ORC tables starts getting the following 
> ArrayIndexOutOfBoundsException AFTER the upgrade to Hive 0.12. The file is 
> added as part of HIVE-4123. 
> 2013-12-04 14:42:08,537 ERROR 
> cause:java.io.IOException: java.io.IOException: 
> java.lang.ArrayIndexOutOfBoundsException: 0
> 2013-12-04 14:42:08,537 WARN org.apache.hadoop.mapred.Child: Error running 
> child
> java.io.IOException: java.io.IOException: 
> java.lang.ArrayIndexOutOfBoundsException: 0
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:304)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:220)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:215)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:200)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> at org.apache.hadoop.mapred.Child.main(Child.java:262)
> Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 0
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:276)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:108)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:302)
> ... 11 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
> at 
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readPatchedBaseValues(RunLengthIntegerReaderV2.java:171)
> at 
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:54)
> at 
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:287)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$LongTreeReader.next(RecordReaderImpl.java:473)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1157)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2196)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:129)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:80)
> at 
> org.apache.hadoop.hive.ql.io.HiveCon

[jira] [Updated] (HIVE-5970) ArrayIndexOutOfBoundsException in RunLengthIntegerReaderV2.java

2013-12-05 Thread Eric Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Chu updated HIVE-5970:
---

Labels: orcfile  (was: )

> ArrayIndexOutOfBoundsException in RunLengthIntegerReaderV2.java
> ---
>
> Key: HIVE-5970
> URL: https://issues.apache.org/jira/browse/HIVE-5970
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.12.0
>Reporter: Eric Chu
>Priority: Critical
>  Labels: orcfile
>
> A workload involving ORC tables starts getting the following 
> ArrayIndexOutOfBoundsException AFTER the upgrade to Hive 0.12. The file is 
> added as part of HIVE-4123. 
> 2013-12-04 14:42:08,537 ERROR 
> cause:java.io.IOException: java.io.IOException: 
> java.lang.ArrayIndexOutOfBoundsException: 0
> 2013-12-04 14:42:08,537 WARN org.apache.hadoop.mapred.Child: Error running 
> child
> java.io.IOException: java.io.IOException: 
> java.lang.ArrayIndexOutOfBoundsException: 0
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:304)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:220)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:215)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:200)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> at org.apache.hadoop.mapred.Child.main(Child.java:262)
> Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 0
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:276)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:108)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:302)
> ... 11 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
> at 
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readPatchedBaseValues(RunLengthIntegerReaderV2.java:171)
> at 
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:54)
> at 
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:287)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$LongTreeReader.next(RecordReaderImpl.java:473)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1157)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2196)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:129)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:80)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274)
> ... 15 more



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5623) ORC accessing array column that's empty will fail with java out of bound exception

2013-12-05 Thread Eric Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Chu updated HIVE-5623:
---

Labels: orcfile  (was: )

> ORC accessing array column that's empty will fail with java out of bound 
> exception
> --
>
> Key: HIVE-5623
> URL: https://issues.apache.org/jira/browse/HIVE-5623
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.11.0
>Reporter: Eric Chu
>Priority: Critical
>  Labels: orcfile
>
> In our ORC tests we saw that queries that work on RCFile failed on the 
> corresponding ORC version with Java IndexOutOfBoundsException in 
> OrcStruct.java. The queries failed b/c the table has an array type column and 
> there are rows with an empty array.  We noticed that the getList(Object list, 
> int i) method in OrcStruct.java simply returns the i-th element from list 
> without checking if list is not null or if i is within valid range. After 
> fixing that the queries run fine. The fix is really simple, but maybe there 
> are other similar cases that need to be handled.
> The fix is to check if listObj is null and if i falls within range:
> public Object getListElement(Object listObj, int i) {
>   if (listObj == null) {
>   return null;
>   }
>   List list = ((List) listObj);
>   if (i < 0 || i >= list.size()) {
>   return null;
>   }
>   return list.get(i);
> }



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HIVE-5970) ArrayIndexOutOfBoundsException in RunLengthIntegerReaderV2.java

2013-12-05 Thread Eric Chu (JIRA)
Eric Chu created HIVE-5970:
--

 Summary: ArrayIndexOutOfBoundsException in 
RunLengthIntegerReaderV2.java
 Key: HIVE-5970
 URL: https://issues.apache.org/jira/browse/HIVE-5970
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.12.0
Reporter: Eric Chu
Priority: Critical


A workload involving ORC tables starts getting the following 
ArrayIndexOutOfBoundsException AFTER the upgrade to Hive 0.12. The file is 
added as part of HIVE-4123. 

2013-12-04 14:42:08,537 ERROR 
cause:java.io.IOException: java.io.IOException: 
java.lang.ArrayIndexOutOfBoundsException: 0
2013-12-04 14:42:08,537 WARN org.apache.hadoop.mapred.Child: Error running child
java.io.IOException: java.io.IOException: 
java.lang.ArrayIndexOutOfBoundsException: 0
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:304)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:220)
at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:215)
at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:200)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 0
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:276)
at 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
at 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:108)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:302)
... 11 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
at 
org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readPatchedBaseValues(RunLengthIntegerReaderV2.java:171)
at 
org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:54)
at 
org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:287)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$LongTreeReader.next(RecordReaderImpl.java:473)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1157)
at 
org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2196)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:129)
at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:80)
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274)
... 15 more



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HIVE-5623) ORC accessing array column that's empty will fail with java out of bound exception

2013-10-23 Thread Eric Chu (JIRA)
Eric Chu created HIVE-5623:
--

 Summary: ORC accessing array column that's empty will fail with 
java out of bound exception
 Key: HIVE-5623
 URL: https://issues.apache.org/jira/browse/HIVE-5623
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 0.11.0
Reporter: Eric Chu
Priority: Critical


In our ORC tests we saw that queries that work on RCFile failed on the 
corresponding ORC version with Java IndexOutOfBoundsException in 
OrcStruct.java. The queries failed b/c the table has an array type column and 
there are rows with an empty array.  We noticed that the getList(Object list, 
int i) method in OrcStruct.java simply returns the i-th element from list 
without checking if list is not null or if i is within valid range. After 
fixing that the queries run fine. The fix is really simple, but maybe there are 
other similar cases that need to be handled.
The fix is to check if listObj is null and if i falls within range:

public Object getListElement(Object listObj, int i) {
  if (listObj == null) {
  return null;
  }
  List list = ((List) listObj);
  if (i < 0 || i >= list.size()) {
  return null;
  }
  return list.get(i);
}





--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-3991) junit failure on Semantic Analysis

2013-09-07 Thread Eric Chu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761145#comment-13761145
 ] 

Eric Chu commented on HIVE-3991:


Is there any update to this? I also got the same error when I ran unit tests on 
just the Hive 11 branch.

> junit failure on Semantic Analysis
> --
>
> Key: HIVE-3991
> URL: https://issues.apache.org/jira/browse/HIVE-3991
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.9.0
> Environment: run multiple times, with different OSes/AntLibs
>  - (OsX) Apache Ant(TM) version 1.8.2 compiled on March 26 2012
>  - (Linux) Apache Ant version 1.7.1 compiled on July 2 2010
>Reporter: Guido Serra aka Zeph
>Priority: Critical
> Attachments: tests_jh.log.gz, tests.log.gz
>
>
> hi, can't apply patches neither change the code cause there are failures
>  in junits from code that is supposed to be stable and release
> bq. * release-0.9.0-rc2 974918c Hive 0.9.0-rc0 release.
> bq. [junit] Tests run: 45, Failures: 6, Errors: 0, Time elapsed: 633.608 
> sec
> {code}
> [junit] diff /home/zeph/hive/build/ql/test/logs/positive/groupby1.q.out 
> /home/zeph/hive/ql/src/test/results/compiler/parse/groupby1.q.out
> [junit] diff -b 
> /home/zeph/hive/build/ql/test/logs/positive/groupby1.q.xml 
> /home/zeph/hive/ql/src/test/results/compiler/plan/groupby1.q.xml
> [junit] 524,525c524
> [junit] < method="valueOf">
> [junit] < 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator$Mode
> [junit] ---
> [junit] > class="org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator$Mode" 
> method="valueOf"> 
> [junit] 602,603c601
> [junit] < method="valueOf">
> [junit] < 
> org.apache.hadoop.hive.ql.plan.GroupByDesc$Mode
> [junit] ---
> [junit] > class="org.apache.hadoop.hive.ql.plan.GroupByDesc$Mode" method="valueOf"> 
> [junit] 1357,1358c1355
> [junit] <  
> [junit] <   
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator$Mode
> [junit] ---
> [junit] junit.framework.AssertionFailedError: Semantic Analysis has 
> unexpected output with error code = 1
> [junit] See build/ql/tmp/hive.log, or try "ant test ... 
> -Dtest.silent=false" to get more logs.
> [junit] at junit.framework.Assert.fail(Assert.java:47)
> [junit] at 
> org.apache.hadoop.hive.ql.parse.TestParse.testParse_groupby1(TestParse.java:214)
> [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) 
> [junit] at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> [junit] at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> [junit] at java.lang.reflect.Method.invoke(Method.java:616)
> [junit] at junit.framework.TestCase.runTest(TestCase.java:154)
> [junit] at junit.framework.TestCase.runBare(TestCase.java:127)
> [junit] at junit.framework.TestResult$1.protect(TestResult.java:106)
> [junit] at 
> junit.framework.TestResult.runProtected(TestResult.java:124)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira