[jira] [Commented] (HIVE-6147) Support avro data stored in HBase columns
[ https://issues.apache.org/jira/browse/HIVE-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594095#comment-16594095 ] Swarnim Kulkarni commented on HIVE-6147: What error message do you see in the logs? > Support avro data stored in HBase columns > - > > Key: HIVE-6147 > URL: https://issues.apache.org/jira/browse/HIVE-6147 > Project: Hive > Issue Type: Improvement > Components: HBase Handler >Affects Versions: 0.12.0, 0.13.0 >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni >Priority: Major > Fix For: 0.14.0 > > Attachments: HIVE-6147.1.patch.txt, HIVE-6147.2.patch.txt, > HIVE-6147.3.patch.txt, HIVE-6147.3.patch.txt, HIVE-6147.4.patch.txt, > HIVE-6147.5.patch.txt, HIVE-6147.6.patch.txt > > > Presently, the HBase Hive integration supports querying only primitive data > types in columns. It would be nice to be able to store and query Avro objects > in HBase columns by making them visible as structs to Hive. This will allow > Hive to perform ad hoc analysis of HBase data which can be deeply structured. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work
[ https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033442#comment-16033442 ] Swarnim Kulkarni commented on HIVE-11609: - [~zsombor.klara] Thanks a lot for taking a look into this. +1 on the patch from me. Anything else we need to do to get this merged in? > Capability to add a filter to hbase scan via composite key doesn't work > --- > > Key: HIVE-11609 > URL: https://issues.apache.org/jira/browse/HIVE-11609 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Reporter: Swarnim Kulkarni >Assignee: Barna Zsombor Klara > Attachments: HIVE-11609.08.patch, HIVE-11609.09.patch, > HIVE-11609.1.patch.txt, HIVE-11609.2.patch.txt, HIVE-11609.3.patch.txt, > HIVE-11609.4.patch.txt, HIVE-11609.5.patch, HIVE-11609.6.patch.txt, > HIVE-11609.7.patch.txt > > > It seems like the capability to add filter to an hbase scan which was added > as part of HIVE-6411 doesn't work. This is primarily because in the > HiveHBaseInputFormat, the filter is added in the getsplits instead of > getrecordreader. This works fine for start and stop keys but not for filter > because a filter is respected only when an actual scan is performed. This is > also related to the initial refactoring that was done as part of HIVE-3420. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work
[ https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15995810#comment-15995810 ] Swarnim Kulkarni commented on HIVE-11609: - Hey [~zsombor.klara]. Please feel free to take a stab if you have time. Unfortunately I am very occupied for next few weeks but can definitely help answer any questions. > Capability to add a filter to hbase scan via composite key doesn't work > --- > > Key: HIVE-11609 > URL: https://issues.apache.org/jira/browse/HIVE-11609 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Attachments: HIVE-11609.1.patch.txt, HIVE-11609.2.patch.txt, > HIVE-11609.3.patch.txt, HIVE-11609.4.patch.txt, HIVE-11609.5.patch, > HIVE-11609.6.patch.txt, HIVE-11609.7.patch.txt > > > It seems like the capability to add filter to an hbase scan which was added > as part of HIVE-6411 doesn't work. This is primarily because in the > HiveHBaseInputFormat, the filter is added in the getsplits instead of > getrecordreader. This works fine for start and stop keys but not for filter > because a filter is respected only when an actual scan is performed. This is > also related to the initial refactoring that was done as part of HIVE-3420. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-6147) Support avro data stored in HBase columns
[ https://issues.apache.org/jira/browse/HIVE-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-6147: --- Labels: (was: TODOC14) Docs look good. Removing the label. > Support avro data stored in HBase columns > - > > Key: HIVE-6147 > URL: https://issues.apache.org/jira/browse/HIVE-6147 > Project: Hive > Issue Type: Improvement > Components: HBase Handler >Affects Versions: 0.12.0, 0.13.0 >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Fix For: 0.14.0 > > Attachments: HIVE-6147.1.patch.txt, HIVE-6147.2.patch.txt, > HIVE-6147.3.patch.txt, HIVE-6147.3.patch.txt, HIVE-6147.4.patch.txt, > HIVE-6147.5.patch.txt, HIVE-6147.6.patch.txt > > > Presently, the HBase Hive integration supports querying only primitive data > types in columns. It would be nice to be able to store and query Avro objects > in HBase columns by making them visible as structs to Hive. This will allow > Hive to perform ad hoc analysis of HBase data which can be deeply structured. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work
[ https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-11609: Attachment: HIVE-11609.7.patch.txt > Capability to add a filter to hbase scan via composite key doesn't work > --- > > Key: HIVE-11609 > URL: https://issues.apache.org/jira/browse/HIVE-11609 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Attachments: HIVE-11609.1.patch.txt, HIVE-11609.2.patch.txt, > HIVE-11609.3.patch.txt, HIVE-11609.4.patch.txt, HIVE-11609.5.patch, > HIVE-11609.6.patch.txt, HIVE-11609.7.patch.txt > > > It seems like the capability to add filter to an hbase scan which was added > as part of HIVE-6411 doesn't work. This is primarily because in the > HiveHBaseInputFormat, the filter is added in the getsplits instead of > getrecordreader. This works fine for start and stop keys but not for filter > because a filter is respected only when an actual scan is performed. This is > also related to the initial refactoring that was done as part of HIVE-3420. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work
[ https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15959709#comment-15959709 ] Swarnim Kulkarni commented on HIVE-11609: - Rebased and updated patch. > Capability to add a filter to hbase scan via composite key doesn't work > --- > > Key: HIVE-11609 > URL: https://issues.apache.org/jira/browse/HIVE-11609 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Attachments: HIVE-11609.1.patch.txt, HIVE-11609.2.patch.txt, > HIVE-11609.3.patch.txt, HIVE-11609.4.patch.txt, HIVE-11609.5.patch, > HIVE-11609.6.patch.txt, HIVE-11609.7.patch.txt > > > It seems like the capability to add filter to an hbase scan which was added > as part of HIVE-6411 doesn't work. This is primarily because in the > HiveHBaseInputFormat, the filter is added in the getsplits instead of > getrecordreader. This works fine for start and stop keys but not for filter > because a filter is respected only when an actual scan is performed. This is > also related to the initial refactoring that was done as part of HIVE-3420. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work
[ https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928510#comment-15928510 ] Swarnim Kulkarni commented on HIVE-11609: - Can I get one of the hive committers to take a quick look at this one? > Capability to add a filter to hbase scan via composite key doesn't work > --- > > Key: HIVE-11609 > URL: https://issues.apache.org/jira/browse/HIVE-11609 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Attachments: HIVE-11609.1.patch.txt, HIVE-11609.2.patch.txt, > HIVE-11609.3.patch.txt, HIVE-11609.4.patch.txt, HIVE-11609.5.patch, > HIVE-11609.6.patch.txt > > > It seems like the capability to add filter to an hbase scan which was added > as part of HIVE-6411 doesn't work. This is primarily because in the > HiveHBaseInputFormat, the filter is added in the getsplits instead of > getrecordreader. This works fine for start and stop keys but not for filter > because a filter is respected only when an actual scan is performed. This is > also related to the initial refactoring that was done as part of HIVE-3420. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work
[ https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-11609: Attachment: HIVE-11609.6.patch.txt > Capability to add a filter to hbase scan via composite key doesn't work > --- > > Key: HIVE-11609 > URL: https://issues.apache.org/jira/browse/HIVE-11609 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Attachments: HIVE-11609.1.patch.txt, HIVE-11609.2.patch.txt, > HIVE-11609.3.patch.txt, HIVE-11609.4.patch.txt, HIVE-11609.5.patch, > HIVE-11609.6.patch.txt > > > It seems like the capability to add filter to an hbase scan which was added > as part of HIVE-6411 doesn't work. This is primarily because in the > HiveHBaseInputFormat, the filter is added in the getsplits instead of > getrecordreader. This works fine for start and stop keys but not for filter > because a filter is respected only when an actual scan is performed. This is > also related to the initial refactoring that was done as part of HIVE-3420. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work
[ https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-11609: Attachment: (was: HIVE-11609.6.patch.txt) > Capability to add a filter to hbase scan via composite key doesn't work > --- > > Key: HIVE-11609 > URL: https://issues.apache.org/jira/browse/HIVE-11609 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Attachments: HIVE-11609.1.patch.txt, HIVE-11609.2.patch.txt, > HIVE-11609.3.patch.txt, HIVE-11609.4.patch.txt, HIVE-11609.5.patch > > > It seems like the capability to add filter to an hbase scan which was added > as part of HIVE-6411 doesn't work. This is primarily because in the > HiveHBaseInputFormat, the filter is added in the getsplits instead of > getrecordreader. This works fine for start and stop keys but not for filter > because a filter is respected only when an actual scan is performed. This is > also related to the initial refactoring that was done as part of HIVE-3420. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work
[ https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-11609: Attachment: HIVE-11609.6.patch.txt [~ychena] Sorry for the late reply on this. I added that code back to the test file as you suggested and have all the test passing locally as well[1]. Attaching the latest patch to be run on Jenkins. [1] http://pastebin.com/gX52nivF > Capability to add a filter to hbase scan via composite key doesn't work > --- > > Key: HIVE-11609 > URL: https://issues.apache.org/jira/browse/HIVE-11609 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Attachments: HIVE-11609.1.patch.txt, HIVE-11609.2.patch.txt, > HIVE-11609.3.patch.txt, HIVE-11609.4.patch.txt, HIVE-11609.5.patch, > HIVE-11609.6.patch.txt > > > It seems like the capability to add filter to an hbase scan which was added > as part of HIVE-6411 doesn't work. This is primarily because in the > HiveHBaseInputFormat, the filter is added in the getsplits instead of > getrecordreader. This works fine for start and stop keys but not for filter > because a filter is respected only when an actual scan is performed. This is > also related to the initial refactoring that was done as part of HIVE-3420. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6147) Support avro data stored in HBase columns
[ https://issues.apache.org/jira/browse/HIVE-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126803#comment-15126803 ] Swarnim Kulkarni commented on HIVE-6147: {quote} Avro supports schema evolution that allows data to be written with one schema and read with another {quote} Yup. Definitely agree. However the point I was trying to make is that you would still need to provide the same exact schema that was used when writing the data. Let's take an example. Let's say you used Schema S1 to write a billion rows to HBase. The Schema then evolved to S2(hopefully in a compatible way) and you write another billion rows with it. The Schema evolves again to S3 and then you write another billion rows. Now to be able to read all this data, this is what you would need to do. 1st billion rows: Writer Schema: S1 Reader Schema: S3 2nd billion rows: Writer Schema: S2 Reader Schema: S3 3rd billion rows: Writer Schema: S3 Reader Schema: S3 So as you see, you are still providing the *exact same version* of the schema that was used to write the data to be able to read it back successfully. Without it, it would be extremely hard for avro for make out head and tail of our data. You "might" still get lucky and be able to deserialize the 1st billion rows using S3 as reader/writer schema but there are absolutely no guarantees whatsoever. Which is why you would still need a way regardless to track what schema was used to write the persist the data when you read it back and the current design of hive/hbase avro support closely follows that pattern. > Support avro data stored in HBase columns > - > > Key: HIVE-6147 > URL: https://issues.apache.org/jira/browse/HIVE-6147 > Project: Hive > Issue Type: Improvement > Components: HBase Handler >Affects Versions: 0.12.0, 0.13.0 >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Labels: TODOC14 > Fix For: 0.14.0 > > Attachments: HIVE-6147.1.patch.txt, HIVE-6147.2.patch.txt, > HIVE-6147.3.patch.txt, HIVE-6147.3.patch.txt, HIVE-6147.4.patch.txt, > HIVE-6147.5.patch.txt, HIVE-6147.6.patch.txt > > > Presently, the HBase Hive integration supports querying only primitive data > types in columns. It would be nice to be able to store and query Avro objects > in HBase columns by making them visible as structs to Hive. This will allow > Hive to perform ad hoc analysis of HBase data which can be deeply structured. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6147) Support avro data stored in HBase columns
[ https://issues.apache.org/jira/browse/HIVE-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15125676#comment-15125676 ] Swarnim Kulkarni commented on HIVE-6147: {quote} It is pretty common to use schema-less avro objects in HBase. {quote} I am not sure if that is true(if possible at all). As far as my understanding goes, you will have to almost always provide the exact schema that was used while persisting the data when attempting to deserialize it and the best way to do that would be to store alongside the schema itself. Plus schema evolution is going to be a mess. Imagine writing a billion rows in HBase with one schema which evolves and then you write another billion rows with new schema. How do you ensure the first billion rows are still correctly readable? {quote} (if there are billions of rows with objects of the same type, it is not reasonable to store the same schema in all of them) and it is not convenient to write a customer schema retriever for each such case. {quote} Correct. I agree it is inefficient to store it for every single cell. Although IMO that isn't a good excuse to not write the schema at all. A better design in this case is to use some kind of schema registry, use a custom serializer, write the schema to the schema registry, generate a id of some kind and persist the id along with the data. Then when you are reading the data, use the id to pull the schema from the store and read the data. That is also where a custom implementation of an AvroSchemaRetriever makes sense where your custom implementation would know how to read your schema from the schema registry and get that to hive and let hive handle the deserialization from there on. > Support avro data stored in HBase columns > - > > Key: HIVE-6147 > URL: https://issues.apache.org/jira/browse/HIVE-6147 > Project: Hive > Issue Type: Improvement > Components: HBase Handler >Affects Versions: 0.12.0, 0.13.0 >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Labels: TODOC14 > Fix For: 0.14.0 > > Attachments: HIVE-6147.1.patch.txt, HIVE-6147.2.patch.txt, > HIVE-6147.3.patch.txt, HIVE-6147.3.patch.txt, HIVE-6147.4.patch.txt, > HIVE-6147.5.patch.txt, HIVE-6147.6.patch.txt > > > Presently, the HBase Hive integration supports querying only primitive data > types in columns. It would be nice to be able to store and query Avro objects > in HBase columns by making them visible as structs to Hive. This will allow > Hive to perform ad hoc analysis of HBase data which can be deeply structured. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work
[ https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15125126#comment-15125126 ] Swarnim Kulkarni commented on HIVE-11609: - [~ychena] Thanks for taking a look. Unfortunately I am not 100% sure at this point why but looking at the sequence of patches it looks like that was in to address a test failure that was caused by the first patch. > Capability to add a filter to hbase scan via composite key doesn't work > --- > > Key: HIVE-11609 > URL: https://issues.apache.org/jira/browse/HIVE-11609 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Attachments: HIVE-11609.1.patch.txt, HIVE-11609.2.patch.txt, > HIVE-11609.3.patch.txt, HIVE-11609.4.patch.txt, HIVE-11609.5.patch > > > It seems like the capability to add filter to an hbase scan which was added > as part of HIVE-6411 doesn't work. This is primarily because in the > HiveHBaseInputFormat, the filter is added in the getsplits instead of > getrecordreader. This works fine for start and stop keys but not for filter > because a filter is respected only when an actual scan is performed. This is > also related to the initial refactoring that was done as part of HIVE-3420. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-6147) Support avro data stored in HBase columns
[ https://issues.apache.org/jira/browse/HIVE-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15125117#comment-15125117 ] Swarnim Kulkarni commented on HIVE-6147: {noformat} it tries to retrieve the write schema from data (ws = retrieveSchemaFromBytes(data)) even if the schema URL (reader schema) had been provided {noformat} Correct. That is the default behavior. The writer schema defaults to the reader schema if one has not been provided. If it has been(like you are doing in your case), it would use the reader schema from the given URL but still default to the writer schema from the data. If you want to provide the writer schema as well, I would recommend you take a look into the AvroSchemaRetriever[1]. You can provide a custom implementation of it and provide both reader and write schema from any custom source that you would like. A test implementation can be found here for reference[2] and the corresponding test that uses this implementation here[3]. Once done, simply plug it in with "avro.schema.retriever" property. One caveat is that this will currently apply to the whole table and not individual columns. So it makes the assumption that there is a uniform schema across the table. Hope this helps. Let me know if there are any additional questions. [1] https://github.com/apache/hive/blob/release-1.2.1/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSchemaRetriever.java [2] https://github.com/apache/hive/blob/release-1.2.1/hbase-handler/src/test/org/apache/hadoop/hive/hbase/HBaseTestAvroSchemaRetriever.java [3] https://github.com/apache/hive/blob/release-1.2.1/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java#L1293-L1344 > Support avro data stored in HBase columns > - > > Key: HIVE-6147 > URL: https://issues.apache.org/jira/browse/HIVE-6147 > Project: Hive > Issue Type: Improvement > Components: HBase Handler >Affects Versions: 0.12.0, 0.13.0 >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Labels: TODOC14 > Fix For: 0.14.0 > > Attachments: HIVE-6147.1.patch.txt, HIVE-6147.2.patch.txt, > HIVE-6147.3.patch.txt, HIVE-6147.3.patch.txt, HIVE-6147.4.patch.txt, > HIVE-6147.5.patch.txt, HIVE-6147.6.patch.txt > > > Presently, the HBase Hive integration supports querying only primitive data > types in columns. It would be nice to be able to store and query Avro objects > in HBase columns by making them visible as structs to Hive. This will allow > Hive to perform ad hoc analysis of HBase data which can be deeply structured. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work
[ https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15090312#comment-15090312 ] Swarnim Kulkarni commented on HIVE-11609: - [~ychena] Anyway to get the tests back running? Seems like reattaching the patch did not kick off the precommit build. > Capability to add a filter to hbase scan via composite key doesn't work > --- > > Key: HIVE-11609 > URL: https://issues.apache.org/jira/browse/HIVE-11609 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Attachments: HIVE-11609.1.patch.txt, HIVE-11609.2.patch.txt, > HIVE-11609.3.patch.txt, HIVE-11609.4.patch.txt > > > It seems like the capability to add filter to an hbase scan which was added > as part of HIVE-6411 doesn't work. This is primarily because in the > HiveHBaseInputFormat, the filter is added in the getsplits instead of > getrecordreader. This works fine for start and stop keys but not for filter > because a filter is respected only when an actual scan is performed. This is > also related to the initial refactoring that was done as part of HIVE-3420. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work
[ https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-11609: Attachment: HIVE-11609.4.patch.txt Rebased and attached new patch to rerun tests. I however have been observing some flakiness locally after rebasing. Hopefully this should help debug if they fail on jenkins as well. > Capability to add a filter to hbase scan via composite key doesn't work > --- > > Key: HIVE-11609 > URL: https://issues.apache.org/jira/browse/HIVE-11609 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Attachments: HIVE-11609.1.patch.txt, HIVE-11609.2.patch.txt, > HIVE-11609.3.patch.txt, HIVE-11609.4.patch.txt > > > It seems like the capability to add filter to an hbase scan which was added > as part of HIVE-6411 doesn't work. This is primarily because in the > HiveHBaseInputFormat, the filter is added in the getsplits instead of > getrecordreader. This works fine for start and stop keys but not for filter > because a filter is respected only when an actual scan is performed. This is > also related to the initial refactoring that was done as part of HIVE-3420. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work
[ https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15058186#comment-15058186 ] Swarnim Kulkarni commented on HIVE-11609: - Thanks for looking into this [~ychena]. It could also be because this patch is currently outdated with the master and I am not getting any of these failures locally. Let me try and rebase this with the master and see if that helps. > Capability to add a filter to hbase scan via composite key doesn't work > --- > > Key: HIVE-11609 > URL: https://issues.apache.org/jira/browse/HIVE-11609 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Attachments: HIVE-11609.1.patch.txt, HIVE-11609.2.patch.txt, > HIVE-11609.3.patch.txt > > > It seems like the capability to add filter to an hbase scan which was added > as part of HIVE-6411 doesn't work. This is primarily because in the > HiveHBaseInputFormat, the filter is added in the getsplits instead of > getrecordreader. This works fine for start and stop keys but not for filter > because a filter is respected only when an actual scan is performed. This is > also related to the initial refactoring that was done as part of HIVE-3420. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work
[ https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-11609: Attachment: HIVE-11609.3.patch.txt Reattaching patch rebasing with master and very minor updates. > Capability to add a filter to hbase scan via composite key doesn't work > --- > > Key: HIVE-11609 > URL: https://issues.apache.org/jira/browse/HIVE-11609 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Attachments: HIVE-11609.1.patch.txt, HIVE-11609.2.patch.txt, > HIVE-11609.3.patch.txt > > > It seems like the capability to add filter to an hbase scan which was added > as part of HIVE-6411 doesn't work. This is primarily because in the > HiveHBaseInputFormat, the filter is added in the getsplits instead of > getrecordreader. This works fine for start and stop keys but not for filter > because a filter is respected only when an actual scan is performed. This is > also related to the initial refactoring that was done as part of HIVE-3420. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work
[ https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947948#comment-14947948 ] Swarnim Kulkarni commented on HIVE-11609: - [~ashutoshc] Mind giving this another quick look and let me know if my comment [here|https://issues.apache.org/jira/browse/HIVE-11609?focusedCommentId=14935951=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14935951] makes sense? > Capability to add a filter to hbase scan via composite key doesn't work > --- > > Key: HIVE-11609 > URL: https://issues.apache.org/jira/browse/HIVE-11609 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Attachments: HIVE-11609.1.patch.txt, HIVE-11609.2.patch.txt, > HIVE-11609.3.patch.txt > > > It seems like the capability to add filter to an hbase scan which was added > as part of HIVE-6411 doesn't work. This is primarily because in the > HiveHBaseInputFormat, the filter is added in the getsplits instead of > getrecordreader. This works fine for start and stop keys but not for filter > because a filter is respected only when an actual scan is performed. This is > also related to the initial refactoring that was done as part of HIVE-3420. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work
[ https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935951#comment-14935951 ] Swarnim Kulkarni commented on HIVE-11609: - [~ashutoshc] So I looked into this a little bit and looks like the fix with HIVE-10940 isn't really going to work, mostly because it seems to be pretty tailored to Tez. The SerializeFilter is only called for TezCompiler[1]. Any reason why this is not called for MapreduceCompiler too or SparkCompiler? As a quick hack, I tried calling it within the MapReduceCompiler in the "optimizeTaskPlan" but that doesn't seem to work very well too. Might need to dig a little bit into what's going on there. If it's ok with you, I would potentially like to log a separate bug though and tackle it there just to keep it separate from what we are trying to do here. If that works, we can re-add the "transient" and only go the SerializeFilter route. [1] https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java#L486-L490 > Capability to add a filter to hbase scan via composite key doesn't work > --- > > Key: HIVE-11609 > URL: https://issues.apache.org/jira/browse/HIVE-11609 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Attachments: HIVE-11609.1.patch.txt, HIVE-11609.2.patch.txt > > > It seems like the capability to add filter to an hbase scan which was added > as part of HIVE-6411 doesn't work. This is primarily because in the > HiveHBaseInputFormat, the filter is added in the getsplits instead of > getrecordreader. This works fine for start and stop keys but not for filter > because a filter is respected only when an actual scan is performed. This is > also related to the initial refactoring that was done as part of HIVE-3420. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work
[ https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933758#comment-14933758 ] Swarnim Kulkarni commented on HIVE-11609: - Thanks a lot [~ashutoshc] for taking the time. I apologize for not getting back earlier. Been pretty swamped with other things. {quote} you have removed transient modifier on filterObject. Any reason for that? {quote} Yes. That is because the scan filters are now correctly handled inside the getRecordReader method unlike previously how they were handled in getSplits. Unlike the start and stop keys on a scan, a scan filter cannot be used to prune out regions before hand but is actually used when the scan actually runs. So when the individual tasks run, they read the filterObject that was previously serialized by hive as part of its execution plan and try to retrieve the scan filter out of it. With the transient in place, hive was serializing everything but the filter object which was need. So I had to modify that code to remove the "transient" so those objects can be serialized as well and be available to the scan. > Capability to add a filter to hbase scan via composite key doesn't work > --- > > Key: HIVE-11609 > URL: https://issues.apache.org/jira/browse/HIVE-11609 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Attachments: HIVE-11609.1.patch.txt, HIVE-11609.2.patch.txt > > > It seems like the capability to add filter to an hbase scan which was added > as part of HIVE-6411 doesn't work. This is primarily because in the > HiveHBaseInputFormat, the filter is added in the getsplits instead of > getrecordreader. This works fine for start and stop keys but not for filter > because a filter is respected only when an actual scan is performed. This is > also related to the initial refactoring that was done as part of HIVE-3420. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work
[ https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933837#comment-14933837 ] Swarnim Kulkarni commented on HIVE-11609: - It would current read the scan object from the jobConf and my understanding was that with the way Kryo works with Hive, lot of this ser and deser happens via this jobConf and is controlled here[1]. So in order for that object to get into jobConf, the field had to be "de-transitionized". [1] https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L787-L798 > Capability to add a filter to hbase scan via composite key doesn't work > --- > > Key: HIVE-11609 > URL: https://issues.apache.org/jira/browse/HIVE-11609 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Attachments: HIVE-11609.1.patch.txt, HIVE-11609.2.patch.txt > > > It seems like the capability to add filter to an hbase scan which was added > as part of HIVE-6411 doesn't work. This is primarily because in the > HiveHBaseInputFormat, the filter is added in the getsplits instead of > getrecordreader. This works fine for start and stop keys but not for filter > because a filter is respected only when an actual scan is performed. This is > also related to the initial refactoring that was done as part of HIVE-3420. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work
[ https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934043#comment-14934043 ] Swarnim Kulkarni commented on HIVE-11609: - Hm. If that is the case, I think it should have worked without this change. Not 100% sure why it fails my testing then if I take the transient out by giving a null for the value for that field. Can you point me to the class with the serializedFilterObject and I will dig deeper. > Capability to add a filter to hbase scan via composite key doesn't work > --- > > Key: HIVE-11609 > URL: https://issues.apache.org/jira/browse/HIVE-11609 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Attachments: HIVE-11609.1.patch.txt, HIVE-11609.2.patch.txt > > > It seems like the capability to add filter to an hbase scan which was added > as part of HIVE-6411 doesn't work. This is primarily because in the > HiveHBaseInputFormat, the filter is added in the getsplits instead of > getrecordreader. This works fine for start and stop keys but not for filter > because a filter is respected only when an actual scan is performed. This is > also related to the initial refactoring that was done as part of HIVE-3420. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work
[ https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934209#comment-14934209 ] Swarnim Kulkarni commented on HIVE-11609: - Ahh. I see what is going on here. So I built this patch off a distribution[1] and tested it out. The distribution apparently did not have HIVE-10940 in it and hence no SerializeFilter class that you refer to. Which is why I had to remove the transient to be able to serialize the filter object. I looked through HIVE-10940 and seems like that should eliminate the need to do this. I am going to try this class and see if it fits the bill. Nice one [~ashutoshc]! [1] https://github.com/cloudera/hive/tree/cdh5.4.5-release/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical > Capability to add a filter to hbase scan via composite key doesn't work > --- > > Key: HIVE-11609 > URL: https://issues.apache.org/jira/browse/HIVE-11609 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Attachments: HIVE-11609.1.patch.txt, HIVE-11609.2.patch.txt > > > It seems like the capability to add filter to an hbase scan which was added > as part of HIVE-6411 doesn't work. This is primarily because in the > HiveHBaseInputFormat, the filter is added in the getsplits instead of > getrecordreader. This works fine for start and stop keys but not for filter > because a filter is respected only when an actual scan is performed. This is > also related to the initial refactoring that was done as part of HIVE-3420. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11329) Column prefix in key of hbase column prefix map
[ https://issues.apache.org/jira/browse/HIVE-11329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14743507#comment-14743507 ] Swarnim Kulkarni commented on HIVE-11329: - +1 on the doc. > Column prefix in key of hbase column prefix map > --- > > Key: HIVE-11329 > URL: https://issues.apache.org/jira/browse/HIVE-11329 > Project: Hive > Issue Type: Improvement > Components: HBase Handler >Affects Versions: 0.14.0 >Reporter: Wojciech Indyk >Assignee: Wojciech Indyk >Priority: Minor > Labels: TODOC1.3 > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11329.3.patch > > > When I create a table with hbase column prefix > https://issues.apache.org/jira/browse/HIVE-3725 I have the prefix in result > map in hive. > E.g. record in HBase > rowkey: 123 > column: tag_one, value: 0.5 > column: tag_two, value 0.5 > representation in Hive via column prefix mapping "tag_.*": > column: tag map> key: tag_one, value: 0.5 > key: tag_two, value: 0.5 > should be: > key: one, value: 0.5 > key: two: value: 0.5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work
[ https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14743520#comment-14743520 ] Swarnim Kulkarni commented on HIVE-11609: - {quote} In one of .q tests, following line is removed : filterExpr: ((key.col1 = '238') and (key.col2 = '1238')) (type: boolean) which indicates filter was not pushed to TableScanOp. {quote} That not really true. With this issue I also found that it seems like the pushdown predicates were getting handled twice, once by the storagehandler and other by hive when they should only get handled by one of them(probably should log another bug for that). So the tests were passing entirely because hive was handling the predicates. The predicates were not even getting converted to the hbase filter. After this fix, the test composite key factory implementation passed to the query will start handling the predicates. That said, I am not entirely sure at this point how that line actually got removed. I'll take a look. > Capability to add a filter to hbase scan via composite key doesn't work > --- > > Key: HIVE-11609 > URL: https://issues.apache.org/jira/browse/HIVE-11609 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Attachments: HIVE-11609.1.patch.txt, HIVE-11609.2.patch.txt > > > It seems like the capability to add filter to an hbase scan which was added > as part of HIVE-6411 doesn't work. This is primarily because in the > HiveHBaseInputFormat, the filter is added in the getsplits instead of > getrecordreader. This works fine for start and stop keys but not for filter > because a filter is respected only when an actual scan is performed. This is > also related to the initial refactoring that was done as part of HIVE-3420. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10708) Add SchemaCompatibility check to AvroDeserializer
[ https://issues.apache.org/jira/browse/HIVE-10708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14738132#comment-14738132 ] Swarnim Kulkarni commented on HIVE-10708: - Test failures seem unrelated to this patch. > Add SchemaCompatibility check to AvroDeserializer > - > > Key: HIVE-10708 > URL: https://issues.apache.org/jira/browse/HIVE-10708 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Attachments: HIVE-10708.1.patch.txt > > > Avro provides a nice API[1] to check if the given reader schema can be used > to deserialize the data given its writer schema. I think it would be super > nice to integrate this into the AvroDeserializer so that we can fail fast and > gracefully if there is a bad schema compatibility > [1] > https://avro.apache.org/docs/1.7.7/api/java/org/apache/avro/SchemaCompatibility.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11756) Avoid redundant key serialization in RS for distinct query
[ https://issues.apache.org/jira/browse/HIVE-11756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736032#comment-14736032 ] Swarnim Kulkarni commented on HIVE-11756: - [~navis] Mind doing a quick RB for this? > Avoid redundant key serialization in RS for distinct query > -- > > Key: HIVE-11756 > URL: https://issues.apache.org/jira/browse/HIVE-11756 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Navis >Assignee: Navis >Priority: Trivial > Attachments: HIVE-11756.1.patch.txt, HIVE-11756.2.patch.txt > > > Currently hive serializes twice to know the length of distribution key for > distinct queries. This introduces IndexedSerializer to avoid this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10708) Add SchemaCompatibility check to AvroDeserializer
[ https://issues.apache.org/jira/browse/HIVE-10708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736147#comment-14736147 ] Swarnim Kulkarni commented on HIVE-10708: - Decided to add a simple flag to turn this compatibility check on. Keeping this flag off by default for backwards compatibility. > Add SchemaCompatibility check to AvroDeserializer > - > > Key: HIVE-10708 > URL: https://issues.apache.org/jira/browse/HIVE-10708 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > > Avro provides a nice API[1] to check if the given reader schema can be used > to deserialize the data given its writer schema. I think it would be super > nice to integrate this into the AvroDeserializer so that we can fail fast and > gracefully if there is a bad schema compatibility > [1] > https://avro.apache.org/docs/1.7.7/api/java/org/apache/avro/SchemaCompatibility.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10708) Add SchemaCompatibility check to AvroDeserializer
[ https://issues.apache.org/jira/browse/HIVE-10708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-10708: Attachment: HIVE-10708.1.patch.txt Patch attached. > Add SchemaCompatibility check to AvroDeserializer > - > > Key: HIVE-10708 > URL: https://issues.apache.org/jira/browse/HIVE-10708 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Attachments: HIVE-10708.1.patch.txt > > > Avro provides a nice API[1] to check if the given reader schema can be used > to deserialize the data given its writer schema. I think it would be super > nice to integrate this into the AvroDeserializer so that we can fail fast and > gracefully if there is a bad schema compatibility > [1] > https://avro.apache.org/docs/1.7.7/api/java/org/apache/avro/SchemaCompatibility.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10708) Add SchemaCompatibility check to AvroDeserializer
[ https://issues.apache.org/jira/browse/HIVE-10708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736153#comment-14736153 ] Swarnim Kulkarni commented on HIVE-10708: - RB: https://reviews.apache.org/r/38203/ > Add SchemaCompatibility check to AvroDeserializer > - > > Key: HIVE-10708 > URL: https://issues.apache.org/jira/browse/HIVE-10708 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Attachments: HIVE-10708.1.patch.txt > > > Avro provides a nice API[1] to check if the given reader schema can be used > to deserialize the data given its writer schema. I think it would be super > nice to integrate this into the AvroDeserializer so that we can fail fast and > gracefully if there is a bad schema compatibility > [1] > https://avro.apache.org/docs/1.7.7/api/java/org/apache/avro/SchemaCompatibility.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11590) AvroDeserializer is very chatty
[ https://issues.apache.org/jira/browse/HIVE-11590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736171#comment-14736171 ] Swarnim Kulkarni commented on HIVE-11590: - RB: https://reviews.apache.org/r/38204/ > AvroDeserializer is very chatty > --- > > Key: HIVE-11590 > URL: https://issues.apache.org/jira/browse/HIVE-11590 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Attachments: HIVE-11590.1.patch.txt > > > It seems like AvroDeserializer is currently very chatty with it logging tons > of messages at INFO level in the mapreduce logs. It would be helpful to push > down some of these to debug level to keep the logs clean. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11590) AvroDeserializer is very chatty
[ https://issues.apache.org/jira/browse/HIVE-11590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-11590: Attachment: HIVE-11590.1.patch.txt Patch attached. > AvroDeserializer is very chatty > --- > > Key: HIVE-11590 > URL: https://issues.apache.org/jira/browse/HIVE-11590 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Attachments: HIVE-11590.1.patch.txt > > > It seems like AvroDeserializer is currently very chatty with it logging tons > of messages at INFO level in the mapreduce logs. It would be helpful to push > down some of these to debug level to keep the logs clean. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11647) Bump hbase version to 1.1.1
[ https://issues.apache.org/jira/browse/HIVE-11647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-11647: Attachment: HIVE-11647.2.patch.txt Attaching updated patch for testing. > Bump hbase version to 1.1.1 > --- > > Key: HIVE-11647 > URL: https://issues.apache.org/jira/browse/HIVE-11647 > Project: Hive > Issue Type: Sub-task > Components: HBase Handler >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Attachments: HIVE-11647.1.patch.txt, HIVE-11647.2.patch.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-10990) Compatibility Hive-1.2 an hbase-1.0.1.1
[ https://issues.apache.org/jira/browse/HIVE-10990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni resolved HIVE-10990. - Resolution: Fixed > Compatibility Hive-1.2 an hbase-1.0.1.1 > --- > > Key: HIVE-10990 > URL: https://issues.apache.org/jira/browse/HIVE-10990 > Project: Hive > Issue Type: Bug > Components: Beeline, HBase Handler, HiveServer2 >Affects Versions: 1.2.0 >Reporter: gurmukh singh >Assignee: Swarnim Kulkarni > > Hive external table works fine with Hbase. > Hive-1.2 and hbase-1.0.1.1, hadoop-2.5.2 > Not able to create a table from hive in hbase. > 1: jdbc:hive2://edge1.dilithium.com:1/def> TBLPROPERTIES > ("hbase.table.name" = "xyz"); > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.DDLTask. > org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V > Error: Error while processing statement: FAILED: Execution Error, return code > 1 from org.apache.hadoop.hive.ql.exec.DDLTask. > org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V > (state=08S01,code=1) > [hdfs@edge1 cluster]$ hive > 2015-06-12 17:56:49,952 WARN [main] conf.HiveConf: HiveConf of name > hive.metastore.local does not exist > Logging initialized using configuration in > jar:file:/usr/local/cluster/apache-hive-1.2.0-bin/lib/hive-common-1.2.0.jar!/hive-log4j.properties > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/usr/local/cluster/apache-hive-1.2.0-bin/auxlib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/local/cluster/hadoop-2.5.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > hive> CREATE TABLE hbase_table_1(key int, value string) > > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") > > TBLPROPERTIES ("hbase.table.name" = "xyz"); > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.DDLTask. > org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V > === > scan complete in 1535ms > 14 driver classes found > Compliant Version Driver Class > no5.1 com.mysql.jdbc.Driver > no5.1 com.mysql.jdbc.NonRegisteringDriver > no5.1 com.mysql.jdbc.NonRegisteringReplicationDriver > no5.1 com.mysql.jdbc.ReplicationDriver > yes 1.2 org.apache.calcite.avatica.remote.Driver > yes 1.2 org.apache.calcite.jdbc.Driver > yes 1.0 org.apache.commons.dbcp.PoolingDriver > yes 10.11 org.apache.derby.jdbc.AutoloadedDriver > yes 10.11 org.apache.derby.jdbc.Driver42 > yes 10.11 org.apache.derby.jdbc.EmbeddedDriver > yes 10.11 org.apache.derby.jdbc.InternalDriver > no1.2 org.apache.hive.jdbc.HiveDriver > yes 1.0 org.datanucleus.store.rdbms.datasource.dbcp.PoolingDriver > no5.1 org.gjt.mm.mysql.Driver -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11647) Bump hbase version to 1.1.1
[ https://issues.apache.org/jira/browse/HIVE-11647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14728488#comment-14728488 ] Swarnim Kulkarni commented on HIVE-11647: - I am a little confused by this failure, especially the "Forbidden" exception. I ran this locally with tests and it all passed. {noformat} [INFO] Scanning for projects... [INFO] [INFO] [INFO] Building Hive HBase Handler 2.0.0-SNAPSHOT [INFO] [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-hbase-handler --- [INFO] Deleting /Users/sk018283/git-repo/apache/hive/hbase-handler/target [INFO] Deleting /Users/sk018283/git-repo/apache/hive/hbase-handler (includes = [datanucleus.log, derby.log], excludes = []) [INFO] [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-no-snapshots) @ hive-hbase-handler --- [INFO] [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-hbase-handler --- Downloading: https://s3-us-west-1.amazonaws.com/hive-spark/maven2/spark_2.10-1.3-rc1/org/pentaho/pentaho-aggdesigner/5.1.5-jhyde/pentaho-aggdesigner-5.1.5-jhyde.pom [WARNING] Invalid project model for artifact [pentaho-aggdesigner-algorithm:org.pentaho:5.1.5-jhyde]. It will be ignored by the remote resources Mojo. [INFO] [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ hive-hbase-handler --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /Users/sk018283/git-repo/apache/hive/hbase-handler/src/main/resources [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-hbase-handler --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ hive-hbase-handler --- [INFO] Compiling 37 source files to /Users/sk018283/git-repo/apache/hive/hbase-handler/target/classes [WARNING] /Users/sk018283/git-repo/apache/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/AbstractHBaseKeyFactory.java: Some input files use or override a deprecated API. [WARNING] /Users/sk018283/git-repo/apache/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/AbstractHBaseKeyFactory.java: Recompile with -Xlint:deprecation for details. [WARNING] /Users/sk018283/git-repo/apache/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDeParameters.java: Some input files use unchecked or unsafe operations. [WARNING] /Users/sk018283/git-repo/apache/hive/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDeParameters.java: Recompile with -Xlint:unchecked for details. [INFO] [INFO] --- avro-maven-plugin:1.7.6:protocol (default) @ hive-hbase-handler --- [INFO] [INFO] --- build-helper-maven-plugin:1.7:add-test-source (add-test-sources) @ hive-hbase-handler --- [INFO] Test Source directory: /Users/sk018283/git-repo/apache/hive/hbase-handler/src/gen/avro/gen-java added. [INFO] [INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ hive-hbase-handler --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /Users/sk018283/git-repo/apache/hive/hbase-handler/src/test/resources [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-hbase-handler --- [INFO] Executing tasks main: [mkdir] Created dir: /Users/sk018283/git-repo/apache/hive/hbase-handler/target/tmp [mkdir] Created dir: /Users/sk018283/git-repo/apache/hive/hbase-handler/target/warehouse [mkdir] Created dir: /Users/sk018283/git-repo/apache/hive/hbase-handler/target/tmp/conf [copy] Copying 10 files to /Users/sk018283/git-repo/apache/hive/hbase-handler/target/tmp/conf [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ hive-hbase-handler --- [INFO] Compiling 16 source files to /Users/sk018283/git-repo/apache/hive/hbase-handler/target/test-classes [WARNING] /Users/sk018283/git-repo/apache/hive/hbase-handler/src/test/org/apache/hadoop/hive/hbase/SampleHBaseKeyFactory2.java: Some input files use or override a deprecated API. [WARNING] /Users/sk018283/git-repo/apache/hive/hbase-handler/src/test/org/apache/hadoop/hive/hbase/SampleHBaseKeyFactory2.java: Recompile with -Xlint:deprecation for details. [WARNING] /Users/sk018283/git-repo/apache/hive/hbase-handler/src/test/org/apache/hadoop/hive/hbase/avro/Address.java: Some input files use unchecked or unsafe operations. [WARNING] /Users/sk018283/git-repo/apache/hive/hbase-handler/src/test/org/apache/hadoop/hive/hbase/avro/Address.java: Recompile with -Xlint:unchecked for details. [INFO] [INFO] --- maven-surefire-plugin:2.16:test (default-test) @
[jira] [Commented] (HIVE-2987) SELECTing nulls returns nothing
[ https://issues.apache.org/jira/browse/HIVE-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14728464#comment-14728464 ] Swarnim Kulkarni commented on HIVE-2987: [~oli...@mineallmeyn.com] This seems to work fine with the latest version of hive. Can you give that a shot and post back here so I can investigate further? > SELECTing nulls returns nothing > --- > > Key: HIVE-2987 > URL: https://issues.apache.org/jira/browse/HIVE-2987 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Affects Versions: 0.9.0 > Environment: Tested using 0.9.0rc1, hbase 0.92.1, hadoop 0.20.2-cdh3u2 >Reporter: Oliver Meyn >Priority: Critical > > Given an hbase table defined as 'test' with a single column family 'a', > rowkey of type string, and two "rows" as follows: > key:1,a:lat=60.0,a:long=50.0,a:precision=10 > key:2,a:lat=54 > And an hive table created overtop of it as follows: > CREATE EXTERNAL TABLE hbase_test ( > id STRING, > latitude STRING, > longitude STRING, > precision STRING > ) > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > WITH SERDEPROPERTIES ("hbase.columns.mapping" = > ":key#s,a:lat#s,a:long#s,a:precision#s") > TBLPROPERTIES( > "hbase.table.name" = "test", > "hbase.table.default.storage.type" = "binary" > ); > The query SELECT id, precision FROM hbase_test WHERE id = '2' returns no > result. Expected behaviour is to return: > '2',NULL > If the query is changed to include a non-null result, eg SELECT id, latitude, > precision FROM hbase_test WHERE id = '2' the result is as expected: > '2','54',NULL -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11705) refactor SARG stripe filtering for ORC into a method
[ https://issues.apache.org/jira/browse/HIVE-11705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726632#comment-14726632 ] Swarnim Kulkarni commented on HIVE-11705: - Left minor comments on RB. Otherwise +1(NB) > refactor SARG stripe filtering for ORC into a method > > > Key: HIVE-11705 > URL: https://issues.apache.org/jira/browse/HIVE-11705 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-11705.01.patch, HIVE-11705.patch > > > For footer cache PPD to metastore, we'd need a method to do the PPD. Tiny > item to create it on OrcInputFormat. > For metastore path, these methods will be called from expression proxy > similar to current objectstore expr filtering; it will change to have > serialized sarg and column list to come from request instead of conf; > includedCols/etc. will also come from request instead of assorted java > objects. > The types and stripe stats will need to be extracted from HBase. This is a > little bit of a problem, since ideally we want to be inside HBase > filter/coprocessor/ I'd need to take a look to see if this is possible... > since that filter would need to either deserialize orc, or we would need to > store types and stats information in some other, non-ORC manner on write. The > latter is probably a better idea, although it's dangerous because there's no > sync between this code and ORC itself. > Meanwhile minimize dependencies for stripe picking to essentials (and conf > which is easy to remove). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11691) Update debugging info on Developer FAQ
[ https://issues.apache.org/jira/browse/HIVE-11691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726596#comment-14726596 ] Swarnim Kulkarni commented on HIVE-11691: - I added a "Debugging" section on the DeveloperFAQ wiki[1]. Can someone do a quick review of this for me and let me know if it looks good? CC [~leftylev] [1] https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ#HiveDeveloperFAQ-Debugging > Update debugging info on Developer FAQ > -- > > Key: HIVE-11691 > URL: https://issues.apache.org/jira/browse/HIVE-11691 > Project: Hive > Issue Type: Task >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > > The debugging info currently on [1] is very inadequate. This should be > updated for future developers. There is some info here[2] but it is pretty > sparse too and related to Ant. > [1] https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ > [2] https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-3725) Add support for pulling HBase columns with prefixes
[ https://issues.apache.org/jira/browse/HIVE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726615#comment-14726615 ] Swarnim Kulkarni commented on HIVE-3725: This feature has been documented here[1]. Removing the TODOC label. [1] https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration#HBaseIntegration-HiveMAPtoHBaseColumnFamily > Add support for pulling HBase columns with prefixes > --- > > Key: HIVE-3725 > URL: https://issues.apache.org/jira/browse/HIVE-3725 > Project: Hive > Issue Type: Improvement > Components: HBase Handler >Affects Versions: 0.9.0 >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Fix For: 0.12.0 > > Attachments: HIVE-3725.1.patch.txt, HIVE-3725.2.patch.txt, > HIVE-3725.3.patch.txt, HIVE-3725.4.patch.txt, HIVE-3725.patch.3.txt > > > Current HBase Hive integration supports reading many values from the same row > by specifying a column family. And specifying just the column family can pull > in all qualifiers within the family. > We should add in support to be able to specify a prefix for the qualifier and > all columns that start with the prefix would automatically get pulled in. A > wildcard support would be ideal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-3725) Add support for pulling HBase columns with prefixes
[ https://issues.apache.org/jira/browse/HIVE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-3725: --- Labels: (was: TODOC12) > Add support for pulling HBase columns with prefixes > --- > > Key: HIVE-3725 > URL: https://issues.apache.org/jira/browse/HIVE-3725 > Project: Hive > Issue Type: Improvement > Components: HBase Handler >Affects Versions: 0.9.0 >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Fix For: 0.12.0 > > Attachments: HIVE-3725.1.patch.txt, HIVE-3725.2.patch.txt, > HIVE-3725.3.patch.txt, HIVE-3725.4.patch.txt, HIVE-3725.patch.3.txt > > > Current HBase Hive integration supports reading many values from the same row > by specifying a column family. And specifying just the column family can pull > in all qualifiers within the family. > We should add in support to be able to specify a prefix for the qualifier and > all columns that start with the prefix would automatically get pulled in. A > wildcard support would be ideal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11647) Bump hbase version to 1.1.1
[ https://issues.apache.org/jira/browse/HIVE-11647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-11647: Attachment: HIVE-11647.1.patch.txt Reattaching patch to re-run tests. > Bump hbase version to 1.1.1 > --- > > Key: HIVE-11647 > URL: https://issues.apache.org/jira/browse/HIVE-11647 > Project: Hive > Issue Type: Sub-task > Components: HBase Handler >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Attachments: HIVE-11647.1.patch.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11647) Bump hbase version to 1.1.1
[ https://issues.apache.org/jira/browse/HIVE-11647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-11647: Attachment: (was: HIVE-11647.1.patch.txt) > Bump hbase version to 1.1.1 > --- > > Key: HIVE-11647 > URL: https://issues.apache.org/jira/browse/HIVE-11647 > Project: Hive > Issue Type: Sub-task > Components: HBase Handler >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Attachments: HIVE-11647.1.patch.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work
[ https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726620#comment-14726620 ] Swarnim Kulkarni commented on HIVE-11609: - [~ndimiduk] When you have a chance, would you be able to take a quick look? > Capability to add a filter to hbase scan via composite key doesn't work > --- > > Key: HIVE-11609 > URL: https://issues.apache.org/jira/browse/HIVE-11609 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Attachments: HIVE-11609.1.patch.txt, HIVE-11609.2.patch.txt > > > It seems like the capability to add filter to an hbase scan which was added > as part of HIVE-6411 doesn't work. This is primarily because in the > HiveHBaseInputFormat, the filter is added in the getsplits instead of > getrecordreader. This works fine for start and stop keys but not for filter > because a filter is respected only when an actual scan is performed. This is > also related to the initial refactoring that was done as part of HIVE-3420. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-11691) Update debugging info on Developer FAQ
[ https://issues.apache.org/jira/browse/HIVE-11691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni reassigned HIVE-11691: --- Assignee: Swarnim Kulkarni Update debugging info on Developer FAQ -- Key: HIVE-11691 URL: https://issues.apache.org/jira/browse/HIVE-11691 Project: Hive Issue Type: Task Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni The debugging info currently on [1] is very inadequate. This should be updated for future developers. There is some info here[2] but it is pretty sparse too and related to Ant. [1] https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ [2] https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10990) Compatibility Hive-1.2 an hbase-1.0.1.1
[ https://issues.apache.org/jira/browse/HIVE-10990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721411#comment-14721411 ] Swarnim Kulkarni commented on HIVE-10990: - I have updated this info on the Hive/HBase Integration wiki[1] to avoid confusion for consumers of this integration. [1] https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration Compatibility Hive-1.2 an hbase-1.0.1.1 --- Key: HIVE-10990 URL: https://issues.apache.org/jira/browse/HIVE-10990 Project: Hive Issue Type: Bug Components: Beeline, HBase Handler, HiveServer2 Affects Versions: 1.2.0 Reporter: gurmukh singh Assignee: Swarnim Kulkarni Hive external table works fine with Hbase. Hive-1.2 and hbase-1.0.1.1, hadoop-2.5.2 Not able to create a table from hive in hbase. 1: jdbc:hive2://edge1.dilithium.com:1/def TBLPROPERTIES (hbase.table.name = xyz); FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V (state=08S01,code=1) [hdfs@edge1 cluster]$ hive 2015-06-12 17:56:49,952 WARN [main] conf.HiveConf: HiveConf of name hive.metastore.local does not exist Logging initialized using configuration in jar:file:/usr/local/cluster/apache-hive-1.2.0-bin/lib/hive-common-1.2.0.jar!/hive-log4j.properties SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/cluster/apache-hive-1.2.0-bin/auxlib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/cluster/hadoop-2.5.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] hive CREATE TABLE hbase_table_1(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf1:val) TBLPROPERTIES (hbase.table.name = xyz); FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V === scan complete in 1535ms 14 driver classes found Compliant Version Driver Class no5.1 com.mysql.jdbc.Driver no5.1 com.mysql.jdbc.NonRegisteringDriver no5.1 com.mysql.jdbc.NonRegisteringReplicationDriver no5.1 com.mysql.jdbc.ReplicationDriver yes 1.2 org.apache.calcite.avatica.remote.Driver yes 1.2 org.apache.calcite.jdbc.Driver yes 1.0 org.apache.commons.dbcp.PoolingDriver yes 10.11 org.apache.derby.jdbc.AutoloadedDriver yes 10.11 org.apache.derby.jdbc.Driver42 yes 10.11 org.apache.derby.jdbc.EmbeddedDriver yes 10.11 org.apache.derby.jdbc.InternalDriver no1.2 org.apache.hive.jdbc.HiveDriver yes 1.0 org.datanucleus.store.rdbms.datasource.dbcp.PoolingDriver no5.1 org.gjt.mm.mysql.Driver -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11689) minor flow changes to ORC split generation
[ https://issues.apache.org/jira/browse/HIVE-11689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721413#comment-14721413 ] Swarnim Kulkarni commented on HIVE-11689: - Super minor comment on RB. +1(NB) minor flow changes to ORC split generation -- Key: HIVE-11689 URL: https://issues.apache.org/jira/browse/HIVE-11689 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-11689.patch There are two changes that would help future work on split PPD into HBase metastore. 1) Move non-HDFS split strategy determination logic into main thread from threadpool. 2) Instead of iterating thru the futures and waiting, use CompletionService to get futures in order of completion. That might be useful by itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work
[ https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721405#comment-14721405 ] Swarnim Kulkarni commented on HIVE-11609: - RB: https://reviews.apache.org/r/37930/ Capability to add a filter to hbase scan via composite key doesn't work --- Key: HIVE-11609 URL: https://issues.apache.org/jira/browse/HIVE-11609 Project: Hive Issue Type: Bug Components: HBase Handler Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Attachments: HIVE-11609.1.patch.txt It seems like the capability to add filter to an hbase scan which was added as part of HIVE-6411 doesn't work. This is primarily because in the HiveHBaseInputFormat, the filter is added in the getsplits instead of getrecordreader. This works fine for start and stop keys but not for filter because a filter is respected only when an actual scan is performed. This is also related to the initial refactoring that was done as part of HIVE-3420. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10990) Compatibility Hive-1.2 an hbase-1.0.1.1
[ https://issues.apache.org/jira/browse/HIVE-10990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721410#comment-14721410 ] Swarnim Kulkarni commented on HIVE-10990: - Thanks [~leftylev]. I updated this info on the wiki[1]. Please do let me know if it looks fine to you. [1] https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration Compatibility Hive-1.2 an hbase-1.0.1.1 --- Key: HIVE-10990 URL: https://issues.apache.org/jira/browse/HIVE-10990 Project: Hive Issue Type: Bug Components: Beeline, HBase Handler, HiveServer2 Affects Versions: 1.2.0 Reporter: gurmukh singh Assignee: Swarnim Kulkarni Hive external table works fine with Hbase. Hive-1.2 and hbase-1.0.1.1, hadoop-2.5.2 Not able to create a table from hive in hbase. 1: jdbc:hive2://edge1.dilithium.com:1/def TBLPROPERTIES (hbase.table.name = xyz); FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V (state=08S01,code=1) [hdfs@edge1 cluster]$ hive 2015-06-12 17:56:49,952 WARN [main] conf.HiveConf: HiveConf of name hive.metastore.local does not exist Logging initialized using configuration in jar:file:/usr/local/cluster/apache-hive-1.2.0-bin/lib/hive-common-1.2.0.jar!/hive-log4j.properties SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/cluster/apache-hive-1.2.0-bin/auxlib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/cluster/hadoop-2.5.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] hive CREATE TABLE hbase_table_1(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf1:val) TBLPROPERTIES (hbase.table.name = xyz); FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V === scan complete in 1535ms 14 driver classes found Compliant Version Driver Class no5.1 com.mysql.jdbc.Driver no5.1 com.mysql.jdbc.NonRegisteringDriver no5.1 com.mysql.jdbc.NonRegisteringReplicationDriver no5.1 com.mysql.jdbc.ReplicationDriver yes 1.2 org.apache.calcite.avatica.remote.Driver yes 1.2 org.apache.calcite.jdbc.Driver yes 1.0 org.apache.commons.dbcp.PoolingDriver yes 10.11 org.apache.derby.jdbc.AutoloadedDriver yes 10.11 org.apache.derby.jdbc.Driver42 yes 10.11 org.apache.derby.jdbc.EmbeddedDriver yes 10.11 org.apache.derby.jdbc.InternalDriver no1.2 org.apache.hive.jdbc.HiveDriver yes 1.0 org.datanucleus.store.rdbms.datasource.dbcp.PoolingDriver no5.1 org.gjt.mm.mysql.Driver -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work
[ https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14722968#comment-14722968 ] Swarnim Kulkarni commented on HIVE-11609: - Some more info on my test environment: 1. My keys were salted which means no hot regions and even key distribution across regions. 2. No pre-splits when loading data. Though I feel the performance would have been even better if the regions were pre-split. 3. My CompositeKeyFactory implementation would take the successive predicates(key1="something" and key2="something2") and convert them to apt scan filter. So results on those types of queries completely depends on how you are handing the predicate to filter conversion logic inside your custom implementation. > Capability to add a filter to hbase scan via composite key doesn't work > --- > > Key: HIVE-11609 > URL: https://issues.apache.org/jira/browse/HIVE-11609 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Attachments: HIVE-11609.1.patch.txt, HIVE-11609.2.patch.txt > > > It seems like the capability to add filter to an hbase scan which was added > as part of HIVE-6411 doesn't work. This is primarily because in the > HiveHBaseInputFormat, the filter is added in the getsplits instead of > getrecordreader. This works fine for start and stop keys but not for filter > because a filter is respected only when an actual scan is performed. This is > also related to the initial refactoring that was done as part of HIVE-3420. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work
[ https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14722971#comment-14722971 ] Swarnim Kulkarni commented on HIVE-11609: - RB request updated with latest patch. > Capability to add a filter to hbase scan via composite key doesn't work > --- > > Key: HIVE-11609 > URL: https://issues.apache.org/jira/browse/HIVE-11609 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Attachments: HIVE-11609.1.patch.txt, HIVE-11609.2.patch.txt > > > It seems like the capability to add filter to an hbase scan which was added > as part of HIVE-6411 doesn't work. This is primarily because in the > HiveHBaseInputFormat, the filter is added in the getsplits instead of > getrecordreader. This works fine for start and stop keys but not for filter > because a filter is respected only when an actual scan is performed. This is > also related to the initial refactoring that was done as part of HIVE-3420. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work
[ https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-11609: Attachment: HIVE-11609.2.patch.txt Updating patch to address the failing test. Capability to add a filter to hbase scan via composite key doesn't work --- Key: HIVE-11609 URL: https://issues.apache.org/jira/browse/HIVE-11609 Project: Hive Issue Type: Bug Components: HBase Handler Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Attachments: HIVE-11609.1.patch.txt, HIVE-11609.2.patch.txt It seems like the capability to add filter to an hbase scan which was added as part of HIVE-6411 doesn't work. This is primarily because in the HiveHBaseInputFormat, the filter is added in the getsplits instead of getrecordreader. This works fine for start and stop keys but not for filter because a filter is respected only when an actual scan is performed. This is also related to the initial refactoring that was done as part of HIVE-3420. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work
[ https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721403#comment-14721403 ] Swarnim Kulkarni commented on HIVE-11609: - [~sershe][~gopalv] [~navis] Mind taking a look at this? One of the things that I had to do to get this working is to serialize the filterObject in TableScanDesc. This seems to be related to HIVE-10940. Capability to add a filter to hbase scan via composite key doesn't work --- Key: HIVE-11609 URL: https://issues.apache.org/jira/browse/HIVE-11609 Project: Hive Issue Type: Bug Components: HBase Handler Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Attachments: HIVE-11609.1.patch.txt It seems like the capability to add filter to an hbase scan which was added as part of HIVE-6411 doesn't work. This is primarily because in the HiveHBaseInputFormat, the filter is added in the getsplits instead of getrecordreader. This works fine for start and stop keys but not for filter because a filter is respected only when an actual scan is performed. This is also related to the initial refactoring that was done as part of HIVE-3420. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work
[ https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721391#comment-14721391 ] Swarnim Kulkarni commented on HIVE-11609: - Here are results from my testing with and without this patch applied. The table my_table for this testing contains about 8 M rows. *Restrict query by single key*: Example query: select * from my_table where key.firstpart=something; || Memory(in MB) || With patch || Without patch || | 1500 | Out of memory | Out of memory | | 3000 | 2.5 minutes | Out of memory | | 6000 | 2.4 minutes | 23 minutes | *Restrict query by multiple key*: (Note that the key parts must be successive for this to work) Example query: select * from my_table where key.firstpart=something and key.secondpart=something2; || Memory(in MB) || With filter || Without filter || | 1500 | 23 sec | Out of memory | | 3000 | 19 sec | Out of memory | | 6000 | 18.8 sec | 24 minutes | So we restrict our filter and get more efficient depending as we get more detailed and deeper with the query. To toggle between using filter and not using it, I set the hive.optimize.ppd.storage flag to false so no predicate pushdown happens. Finally query without M/R job: *Restrict query by multiple key*: (No M/R job) Example query: select * from my_table where key.firstpart=something and key.secondpart=something2; || Memory(in MB) || With filter || Without filter || | 3000 | 5 sec | 19 minutes | Capability to add a filter to hbase scan via composite key doesn't work --- Key: HIVE-11609 URL: https://issues.apache.org/jira/browse/HIVE-11609 Project: Hive Issue Type: Bug Components: HBase Handler Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Attachments: HIVE-11609.1.patch.txt It seems like the capability to add filter to an hbase scan which was added as part of HIVE-6411 doesn't work. This is primarily because in the HiveHBaseInputFormat, the filter is added in the getsplits instead of getrecordreader. This works fine for start and stop keys but not for filter because a filter is respected only when an actual scan is performed. This is also related to the initial refactoring that was done as part of HIVE-3420. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work
[ https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-11609: Attachment: HIVE-11609.1.patch.txt Uploading patch. Capability to add a filter to hbase scan via composite key doesn't work --- Key: HIVE-11609 URL: https://issues.apache.org/jira/browse/HIVE-11609 Project: Hive Issue Type: Bug Components: HBase Handler Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Attachments: HIVE-11609.1.patch.txt It seems like the capability to add filter to an hbase scan which was added as part of HIVE-6411 doesn't work. This is primarily because in the HiveHBaseInputFormat, the filter is added in the getsplits instead of getrecordreader. This works fine for start and stop keys but not for filter because a filter is respected only when an actual scan is performed. This is also related to the initial refactoring that was done as part of HIVE-3420. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10990) Compatibility Hive-1.2 an hbase-1.0.1.1
[ https://issues.apache.org/jira/browse/HIVE-10990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715187#comment-14715187 ] Swarnim Kulkarni commented on HIVE-10990: - {quote} Thus, you just need to recompile hive vs. the correct runtime hbase version; no source change required. {quote} Yeah that's what I was referring to do with my comment here[1]. Unfortunately though like I mentioned, not sure if we can do this in general and then release as that would break passivity for consumers hbase 1.0. Primarily the reason why we are choosing to leave hive 1.x stream on hbase 0.98.x as that branch is currently maintaining backwards compatibility and then bump hive 2.x to hbase 1.x. [1] https://issues.apache.org/jira/browse/HIVE-10990?focusedCommentId=14713591page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14713591 Compatibility Hive-1.2 an hbase-1.0.1.1 --- Key: HIVE-10990 URL: https://issues.apache.org/jira/browse/HIVE-10990 Project: Hive Issue Type: Bug Components: Beeline, HBase Handler, HiveServer2 Affects Versions: 1.2.0 Reporter: gurmukh singh Assignee: Swarnim Kulkarni Hive external table works fine with Hbase. Hive-1.2 and hbase-1.0.1.1, hadoop-2.5.2 Not able to create a table from hive in hbase. 1: jdbc:hive2://edge1.dilithium.com:1/def TBLPROPERTIES (hbase.table.name = xyz); FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V (state=08S01,code=1) [hdfs@edge1 cluster]$ hive 2015-06-12 17:56:49,952 WARN [main] conf.HiveConf: HiveConf of name hive.metastore.local does not exist Logging initialized using configuration in jar:file:/usr/local/cluster/apache-hive-1.2.0-bin/lib/hive-common-1.2.0.jar!/hive-log4j.properties SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/cluster/apache-hive-1.2.0-bin/auxlib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/cluster/hadoop-2.5.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] hive CREATE TABLE hbase_table_1(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf1:val) TBLPROPERTIES (hbase.table.name = xyz); FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V === scan complete in 1535ms 14 driver classes found Compliant Version Driver Class no5.1 com.mysql.jdbc.Driver no5.1 com.mysql.jdbc.NonRegisteringDriver no5.1 com.mysql.jdbc.NonRegisteringReplicationDriver no5.1 com.mysql.jdbc.ReplicationDriver yes 1.2 org.apache.calcite.avatica.remote.Driver yes 1.2 org.apache.calcite.jdbc.Driver yes 1.0 org.apache.commons.dbcp.PoolingDriver yes 10.11 org.apache.derby.jdbc.AutoloadedDriver yes 10.11 org.apache.derby.jdbc.Driver42 yes 10.11 org.apache.derby.jdbc.EmbeddedDriver yes 10.11 org.apache.derby.jdbc.InternalDriver no1.2 org.apache.hive.jdbc.HiveDriver yes 1.0 org.datanucleus.store.rdbms.datasource.dbcp.PoolingDriver no5.1 org.gjt.mm.mysql.Driver -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10990) Compatibility Hive-1.2 an hbase-1.0.1.1
[ https://issues.apache.org/jira/browse/HIVE-10990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715202#comment-14715202 ] Swarnim Kulkarni commented on HIVE-10990: - {quote} Please don't use any 0.99. This was a developer preview of 1.0 and is not meant for use by anyone other than HBase developers, and at this point is an artifact of historical interest at best. {quote} Good call on this. I wasn't aware of that. Compatibility Hive-1.2 an hbase-1.0.1.1 --- Key: HIVE-10990 URL: https://issues.apache.org/jira/browse/HIVE-10990 Project: Hive Issue Type: Bug Components: Beeline, HBase Handler, HiveServer2 Affects Versions: 1.2.0 Reporter: gurmukh singh Assignee: Swarnim Kulkarni Hive external table works fine with Hbase. Hive-1.2 and hbase-1.0.1.1, hadoop-2.5.2 Not able to create a table from hive in hbase. 1: jdbc:hive2://edge1.dilithium.com:1/def TBLPROPERTIES (hbase.table.name = xyz); FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V (state=08S01,code=1) [hdfs@edge1 cluster]$ hive 2015-06-12 17:56:49,952 WARN [main] conf.HiveConf: HiveConf of name hive.metastore.local does not exist Logging initialized using configuration in jar:file:/usr/local/cluster/apache-hive-1.2.0-bin/lib/hive-common-1.2.0.jar!/hive-log4j.properties SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/cluster/apache-hive-1.2.0-bin/auxlib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/cluster/hadoop-2.5.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] hive CREATE TABLE hbase_table_1(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf1:val) TBLPROPERTIES (hbase.table.name = xyz); FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V === scan complete in 1535ms 14 driver classes found Compliant Version Driver Class no5.1 com.mysql.jdbc.Driver no5.1 com.mysql.jdbc.NonRegisteringDriver no5.1 com.mysql.jdbc.NonRegisteringReplicationDriver no5.1 com.mysql.jdbc.ReplicationDriver yes 1.2 org.apache.calcite.avatica.remote.Driver yes 1.2 org.apache.calcite.jdbc.Driver yes 1.0 org.apache.commons.dbcp.PoolingDriver yes 10.11 org.apache.derby.jdbc.AutoloadedDriver yes 10.11 org.apache.derby.jdbc.Driver42 yes 10.11 org.apache.derby.jdbc.EmbeddedDriver yes 10.11 org.apache.derby.jdbc.InternalDriver no1.2 org.apache.hive.jdbc.HiveDriver yes 1.0 org.datanucleus.store.rdbms.datasource.dbcp.PoolingDriver no5.1 org.gjt.mm.mysql.Driver -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10990) Compatibility Hive-1.2 an hbase-1.0.1.1
[ https://issues.apache.org/jira/browse/HIVE-10990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14713477#comment-14713477 ] Swarnim Kulkarni commented on HIVE-10990: - [~gurmukhd] Did you mention before that it was working on hive 1.1 and hbase 1.0? Compatibility Hive-1.2 an hbase-1.0.1.1 --- Key: HIVE-10990 URL: https://issues.apache.org/jira/browse/HIVE-10990 Project: Hive Issue Type: Bug Components: Beeline, HBase Handler, HiveServer2 Affects Versions: 1.2.0 Reporter: gurmukh singh Assignee: Swarnim Kulkarni Hive external table works fine with Hbase. Hive-1.2 and hbase-1.0.1.1, hadoop-2.5.2 Not able to create a table from hive in hbase. 1: jdbc:hive2://edge1.dilithium.com:1/def TBLPROPERTIES (hbase.table.name = xyz); FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V (state=08S01,code=1) [hdfs@edge1 cluster]$ hive 2015-06-12 17:56:49,952 WARN [main] conf.HiveConf: HiveConf of name hive.metastore.local does not exist Logging initialized using configuration in jar:file:/usr/local/cluster/apache-hive-1.2.0-bin/lib/hive-common-1.2.0.jar!/hive-log4j.properties SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/cluster/apache-hive-1.2.0-bin/auxlib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/cluster/hadoop-2.5.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] hive CREATE TABLE hbase_table_1(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf1:val) TBLPROPERTIES (hbase.table.name = xyz); FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V === scan complete in 1535ms 14 driver classes found Compliant Version Driver Class no5.1 com.mysql.jdbc.Driver no5.1 com.mysql.jdbc.NonRegisteringDriver no5.1 com.mysql.jdbc.NonRegisteringReplicationDriver no5.1 com.mysql.jdbc.ReplicationDriver yes 1.2 org.apache.calcite.avatica.remote.Driver yes 1.2 org.apache.calcite.jdbc.Driver yes 1.0 org.apache.commons.dbcp.PoolingDriver yes 10.11 org.apache.derby.jdbc.AutoloadedDriver yes 10.11 org.apache.derby.jdbc.Driver42 yes 10.11 org.apache.derby.jdbc.EmbeddedDriver yes 10.11 org.apache.derby.jdbc.InternalDriver no1.2 org.apache.hive.jdbc.HiveDriver yes 1.0 org.datanucleus.store.rdbms.datasource.dbcp.PoolingDriver no5.1 org.gjt.mm.mysql.Driver -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10990) Compatibility Hive-1.2 an hbase-1.0.1.1
[ https://issues.apache.org/jira/browse/HIVE-10990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712102#comment-14712102 ] Swarnim Kulkarni commented on HIVE-10990: - [~kevinl] Can you post full logs for this query? They should be in /tmp/user folder. Compatibility Hive-1.2 an hbase-1.0.1.1 --- Key: HIVE-10990 URL: https://issues.apache.org/jira/browse/HIVE-10990 Project: Hive Issue Type: Bug Components: Beeline, HBase Handler, HiveServer2 Affects Versions: 1.2.0 Reporter: gurmukh singh Assignee: Swarnim Kulkarni Hive external table works fine with Hbase. Hive-1.2 and hbase-1.0.1.1, hadoop-2.5.2 Not able to create a table from hive in hbase. 1: jdbc:hive2://edge1.dilithium.com:1/def TBLPROPERTIES (hbase.table.name = xyz); FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V (state=08S01,code=1) [hdfs@edge1 cluster]$ hive 2015-06-12 17:56:49,952 WARN [main] conf.HiveConf: HiveConf of name hive.metastore.local does not exist Logging initialized using configuration in jar:file:/usr/local/cluster/apache-hive-1.2.0-bin/lib/hive-common-1.2.0.jar!/hive-log4j.properties SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/cluster/apache-hive-1.2.0-bin/auxlib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/cluster/hadoop-2.5.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] hive CREATE TABLE hbase_table_1(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf1:val) TBLPROPERTIES (hbase.table.name = xyz); FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V === scan complete in 1535ms 14 driver classes found Compliant Version Driver Class no5.1 com.mysql.jdbc.Driver no5.1 com.mysql.jdbc.NonRegisteringDriver no5.1 com.mysql.jdbc.NonRegisteringReplicationDriver no5.1 com.mysql.jdbc.ReplicationDriver yes 1.2 org.apache.calcite.avatica.remote.Driver yes 1.2 org.apache.calcite.jdbc.Driver yes 1.0 org.apache.commons.dbcp.PoolingDriver yes 10.11 org.apache.derby.jdbc.AutoloadedDriver yes 10.11 org.apache.derby.jdbc.Driver42 yes 10.11 org.apache.derby.jdbc.EmbeddedDriver yes 10.11 org.apache.derby.jdbc.InternalDriver no1.2 org.apache.hive.jdbc.HiveDriver yes 1.0 org.datanucleus.store.rdbms.datasource.dbcp.PoolingDriver no5.1 org.gjt.mm.mysql.Driver -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10990) Compatibility Hive-1.2 an hbase-1.0.1.1
[ https://issues.apache.org/jira/browse/HIVE-10990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712215#comment-14712215 ] Swarnim Kulkarni commented on HIVE-10990: - [~kevinl] That is correct. However it should be noted that the branch 1.x of hive is going to stay on hbase 1.0 still to maintain passivity with older versions of hbase. Please follow the discussion here[1]. Branch 2.x of hive would be moving over to hbase 1.x. I created [2] to bump hbase version to 1.1.1. [1] https://www.mail-archive.com/dev@hive.apache.org/msg114984.html [2] https://issues.apache.org/jira/browse/HIVE-11647 Compatibility Hive-1.2 an hbase-1.0.1.1 --- Key: HIVE-10990 URL: https://issues.apache.org/jira/browse/HIVE-10990 Project: Hive Issue Type: Bug Components: Beeline, HBase Handler, HiveServer2 Affects Versions: 1.2.0 Reporter: gurmukh singh Assignee: Swarnim Kulkarni Hive external table works fine with Hbase. Hive-1.2 and hbase-1.0.1.1, hadoop-2.5.2 Not able to create a table from hive in hbase. 1: jdbc:hive2://edge1.dilithium.com:1/def TBLPROPERTIES (hbase.table.name = xyz); FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V (state=08S01,code=1) [hdfs@edge1 cluster]$ hive 2015-06-12 17:56:49,952 WARN [main] conf.HiveConf: HiveConf of name hive.metastore.local does not exist Logging initialized using configuration in jar:file:/usr/local/cluster/apache-hive-1.2.0-bin/lib/hive-common-1.2.0.jar!/hive-log4j.properties SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/cluster/apache-hive-1.2.0-bin/auxlib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/cluster/hadoop-2.5.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] hive CREATE TABLE hbase_table_1(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf1:val) TBLPROPERTIES (hbase.table.name = xyz); FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V === scan complete in 1535ms 14 driver classes found Compliant Version Driver Class no5.1 com.mysql.jdbc.Driver no5.1 com.mysql.jdbc.NonRegisteringDriver no5.1 com.mysql.jdbc.NonRegisteringReplicationDriver no5.1 com.mysql.jdbc.ReplicationDriver yes 1.2 org.apache.calcite.avatica.remote.Driver yes 1.2 org.apache.calcite.jdbc.Driver yes 1.0 org.apache.commons.dbcp.PoolingDriver yes 10.11 org.apache.derby.jdbc.AutoloadedDriver yes 10.11 org.apache.derby.jdbc.Driver42 yes 10.11 org.apache.derby.jdbc.EmbeddedDriver yes 10.11 org.apache.derby.jdbc.InternalDriver no1.2 org.apache.hive.jdbc.HiveDriver yes 1.0 org.datanucleus.store.rdbms.datasource.dbcp.PoolingDriver no5.1 org.gjt.mm.mysql.Driver -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11647) Bump hbase version to 1.1.1
[ https://issues.apache.org/jira/browse/HIVE-11647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-11647: Attachment: HIVE-11647.1.patch.txt Patch attached. Bump hbase version to 1.1.1 --- Key: HIVE-11647 URL: https://issues.apache.org/jira/browse/HIVE-11647 Project: Hive Issue Type: Sub-task Components: HBase Handler Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Attachments: HIVE-11647.1.patch.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-11647) Bump hbase version to 1.1.1
[ https://issues.apache.org/jira/browse/HIVE-11647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni reassigned HIVE-11647: --- Assignee: Swarnim Kulkarni Bump hbase version to 1.1.1 --- Key: HIVE-11647 URL: https://issues.apache.org/jira/browse/HIVE-11647 Project: Hive Issue Type: Sub-task Components: HBase Handler Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-7534) remove reflection from HBaseSplit
[ https://issues.apache.org/jira/browse/HIVE-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni reassigned HIVE-7534: -- Assignee: Swarnim Kulkarni remove reflection from HBaseSplit - Key: HIVE-7534 URL: https://issues.apache.org/jira/browse/HIVE-7534 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.14.0 Reporter: Nick Dimiduk Assignee: Swarnim Kulkarni Priority: Minor HIVE-6584 does some reflection voodoo to work around the lack of HBASE-11555 for version hbase-0.98.3. This ticket is to bump the hbase dependency version and clean up that code once hbase-0.98.5 has released. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11607) Export tables broken for data 32 MB
[ https://issues.apache.org/jira/browse/HIVE-11607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705059#comment-14705059 ] Swarnim Kulkarni commented on HIVE-11607: - +1(NB) Export tables broken for data 32 MB - Key: HIVE-11607 URL: https://issues.apache.org/jira/browse/HIVE-11607 Project: Hive Issue Type: Bug Components: Import/Export Affects Versions: 1.0.0, 1.2.0, 1.1.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-11607.patch Broken for both hadoop-1 as well as hadoop-2 line -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work
[ https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni reassigned HIVE-11609: --- Assignee: Swarnim Kulkarni Capability to add a filter to hbase scan via composite key doesn't work --- Key: HIVE-11609 URL: https://issues.apache.org/jira/browse/HIVE-11609 Project: Hive Issue Type: Bug Components: HBase Handler Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni It seems like the capability to add filter to an hbase scan which was added as part of HIVE-6411 doesn't work. This is primarily because in the HiveHBaseInputFormat, the filter is added in the getsplits instead of getrecordreader. This works fine for start and stop keys but not for filter because a filter is respected only when an actual scan is performed. This is also related to the initial refactoring that was done as part of HIVE-3420. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10697) ObjectInspectorConvertors#UnionConvertor does a faulty conversion
[ https://issues.apache.org/jira/browse/HIVE-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701938#comment-14701938 ] Swarnim Kulkarni commented on HIVE-10697: - The RB has been updated too. ObjectInspectorConvertors#UnionConvertor does a faulty conversion - Key: HIVE-10697 URL: https://issues.apache.org/jira/browse/HIVE-10697 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Attachments: HIVE-10697.1.patch.txt, HIVE-10697.2.patch.txt Currently the UnionConvertor in the ObjectInspectorConvertors class has an issue with the convert method where it attempts to convert the objectinspector itself instead of converting the field.[1]. This should be changed to convert the field itself. This could result in a ClassCastException as shown below: {code} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.objectinspector.LazyUnionObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.lazy.LazyString at org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyStringObjectInspector.getPrimitiveWritableObject(LazyStringObjectInspector.java:51) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$TextConverter.convert(PrimitiveObjectInspectorConverter.java:391) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$TextConverter.convert(PrimitiveObjectInspectorConverter.java:338) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$UnionConverter.convert(ObjectInspectorConverters.java:456) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:395) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$MapConverter.convert(ObjectInspectorConverters.java:539) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:395) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:518) ... 9 more {code} [1] https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorConverters.java#L466 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10697) ObjectInspectorConvertors#UnionConvertor does a faulty conversion
[ https://issues.apache.org/jira/browse/HIVE-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700683#comment-14700683 ] Swarnim Kulkarni commented on HIVE-10697: - [~hsubramaniyan] Would you mind reviewing the patch? ObjectInspectorConvertors#UnionConvertor does a faulty conversion - Key: HIVE-10697 URL: https://issues.apache.org/jira/browse/HIVE-10697 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Attachments: HIVE-10697.1.patch.txt Currently the UnionConvertor in the ObjectInspectorConvertors class has an issue with the convert method where it attempts to convert the objectinspector itself instead of converting the field.[1]. This should be changed to convert the field itself. This could result in a ClassCastException as shown below: {code} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.objectinspector.LazyUnionObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.lazy.LazyString at org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyStringObjectInspector.getPrimitiveWritableObject(LazyStringObjectInspector.java:51) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$TextConverter.convert(PrimitiveObjectInspectorConverter.java:391) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$TextConverter.convert(PrimitiveObjectInspectorConverter.java:338) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$UnionConverter.convert(ObjectInspectorConverters.java:456) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:395) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$MapConverter.convert(ObjectInspectorConverters.java:539) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:395) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:518) ... 9 more {code} [1] https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorConverters.java#L466 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-11590) AvroDeserializer is very chatty
[ https://issues.apache.org/jira/browse/HIVE-11590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni reassigned HIVE-11590: --- Assignee: Swarnim Kulkarni AvroDeserializer is very chatty --- Key: HIVE-11590 URL: https://issues.apache.org/jira/browse/HIVE-11590 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni It seems like AvroDeserializer is currently very chatty with it logging tons of messages at INFO level in the mapreduce logs. It would be helpful to push down some of these to debug level to keep the logs clean. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10697) ObjectInspectorConvertors#UnionConvertor does a faulty conversion
[ https://issues.apache.org/jira/browse/HIVE-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-10697: Attachment: HIVE-10697.1.patch.txt Patch attached. ObjectInspectorConvertors#UnionConvertor does a faulty conversion - Key: HIVE-10697 URL: https://issues.apache.org/jira/browse/HIVE-10697 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Attachments: HIVE-10697.1.patch.txt Currently the UnionConvertor in the ObjectInspectorConvertors class has an issue with the convert method where it attempts to convert the objectinspector itself instead of converting the field.[1]. This should be changed to convert the field itself. This could result in a ClassCastException as shown below: {code} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.objectinspector.LazyUnionObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.lazy.LazyString at org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyStringObjectInspector.getPrimitiveWritableObject(LazyStringObjectInspector.java:51) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$TextConverter.convert(PrimitiveObjectInspectorConverter.java:391) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$TextConverter.convert(PrimitiveObjectInspectorConverter.java:338) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$UnionConverter.convert(ObjectInspectorConverters.java:456) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:395) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$MapConverter.convert(ObjectInspectorConverters.java:539) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:395) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:518) ... 9 more {code} [1] https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorConverters.java#L466 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10697) ObjectInspectorConvertors#UnionConvertor does a faulty conversion
[ https://issues.apache.org/jira/browse/HIVE-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700682#comment-14700682 ] Swarnim Kulkarni commented on HIVE-10697: - RB: https://reviews.apache.org/r/37563/ ObjectInspectorConvertors#UnionConvertor does a faulty conversion - Key: HIVE-10697 URL: https://issues.apache.org/jira/browse/HIVE-10697 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Attachments: HIVE-10697.1.patch.txt Currently the UnionConvertor in the ObjectInspectorConvertors class has an issue with the convert method where it attempts to convert the objectinspector itself instead of converting the field.[1]. This should be changed to convert the field itself. This could result in a ClassCastException as shown below: {code} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.objectinspector.LazyUnionObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.lazy.LazyString at org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyStringObjectInspector.getPrimitiveWritableObject(LazyStringObjectInspector.java:51) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$TextConverter.convert(PrimitiveObjectInspectorConverter.java:391) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$TextConverter.convert(PrimitiveObjectInspectorConverter.java:338) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$UnionConverter.convert(ObjectInspectorConverters.java:456) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:395) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$MapConverter.convert(ObjectInspectorConverters.java:539) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:395) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:154) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:127) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:518) ... 9 more {code} [1] https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorConverters.java#L466 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11513) AvroLazyObjectInspector could handle empty data better
[ https://issues.apache.org/jira/browse/HIVE-11513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699580#comment-14699580 ] Swarnim Kulkarni commented on HIVE-11513: - [~xuefuz] Are we good to merge this? AvroLazyObjectInspector could handle empty data better -- Key: HIVE-11513 URL: https://issues.apache.org/jira/browse/HIVE-11513 Project: Hive Issue Type: Improvement Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Attachments: HIVE-11513.1.patch.txt, HIVE-11513.2.patch.txt, HIVE-11513.3.patch.txt Currently in the AvroLazyObjectInspector, it looks like we only handle the case when the data send to deserialize is null[1]. It would be nice to handle the case when it is empty. [1] https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroLazyObjectInspector.java#L226-L228 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-3725) Add support for pulling HBase columns with prefixes
[ https://issues.apache.org/jira/browse/HIVE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699912#comment-14699912 ] Swarnim Kulkarni commented on HIVE-3725: [~leftylev] Mind giving me access to the wiki so I can document this and a couple others on the wiki? Add support for pulling HBase columns with prefixes --- Key: HIVE-3725 URL: https://issues.apache.org/jira/browse/HIVE-3725 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.9.0 Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Labels: TODOC12 Fix For: 0.12.0 Attachments: HIVE-3725.1.patch.txt, HIVE-3725.2.patch.txt, HIVE-3725.3.patch.txt, HIVE-3725.4.patch.txt, HIVE-3725.patch.3.txt Current HBase Hive integration supports reading many values from the same row by specifying a column family. And specifying just the column family can pull in all qualifiers within the family. We should add in support to be able to specify a prefix for the qualifier and all columns that start with the prefix would automatically get pulled in. A wildcard support would be ideal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-3725) Add support for pulling HBase columns with prefixes
[ https://issues.apache.org/jira/browse/HIVE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699952#comment-14699952 ] Swarnim Kulkarni commented on HIVE-3725: Thanks [~leftylev]. My confluence username is swarnim Add support for pulling HBase columns with prefixes --- Key: HIVE-3725 URL: https://issues.apache.org/jira/browse/HIVE-3725 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.9.0 Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Labels: TODOC12 Fix For: 0.12.0 Attachments: HIVE-3725.1.patch.txt, HIVE-3725.2.patch.txt, HIVE-3725.3.patch.txt, HIVE-3725.4.patch.txt, HIVE-3725.patch.3.txt Current HBase Hive integration supports reading many values from the same row by specifying a column family. And specifying just the column family can pull in all qualifiers within the family. We should add in support to be able to specify a prefix for the qualifier and all columns that start with the prefix would automatically get pulled in. A wildcard support would be ideal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8898) Remove HIVE-8874 once HBASE-12493 is fixed
[ https://issues.apache.org/jira/browse/HIVE-8898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697428#comment-14697428 ] Swarnim Kulkarni commented on HIVE-8898: I logged a JIRA here[1] to revert the work done. [1] https://issues.apache.org/jira/browse/HIVE-11559 Remove HIVE-8874 once HBASE-12493 is fixed -- Key: HIVE-8898 URL: https://issues.apache.org/jira/browse/HIVE-8898 Project: Hive Issue Type: Task Components: HBase Handler Reporter: Brock Noland Assignee: Yongzhi Chen Priority: Blocker Fix For: 1.2.0 Attachments: HIVE-8898.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11560) Patch for branch-1
[ https://issues.apache.org/jira/browse/HIVE-11560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-11560: Attachment: HIVE-11560.1.patch.txt Patch attached. Patch for branch-1 -- Key: HIVE-11560 URL: https://issues.apache.org/jira/browse/HIVE-11560 Project: Hive Issue Type: Sub-task Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Attachments: HIVE-11560.1.patch.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11561) Patch for master
[ https://issues.apache.org/jira/browse/HIVE-11561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697482#comment-14697482 ] Swarnim Kulkarni commented on HIVE-11561: - I actually logged this one but seems like it's not really needed as we decided to keep forward the master to 1.x. So we would be keeping this NP change on master and then upgrading it to 1.x of HBase as part of [1]. Marking as won't fix. [1] https://issues.apache.org/jira/browse/HIVE-10491 Patch for master Key: HIVE-11561 URL: https://issues.apache.org/jira/browse/HIVE-11561 Project: Hive Issue Type: Sub-task Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-11561) Patch for master
[ https://issues.apache.org/jira/browse/HIVE-11561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni resolved HIVE-11561. - Resolution: Won't Fix Patch for master Key: HIVE-11561 URL: https://issues.apache.org/jira/browse/HIVE-11561 Project: Hive Issue Type: Sub-task Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-11559) Revert work done in HIVE-8898
[ https://issues.apache.org/jira/browse/HIVE-11559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni reassigned HIVE-11559: --- Assignee: Swarnim Kulkarni Revert work done in HIVE-8898 - Key: HIVE-11559 URL: https://issues.apache.org/jira/browse/HIVE-11559 Project: Hive Issue Type: Bug Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni We unfortunately need to revert the work done in HIVE-8898 as it is non-passive with the older hbase versions. We need to revert this from branch-1 and commit this onto master to maintain passivity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-11560) Patch for branch-1
[ https://issues.apache.org/jira/browse/HIVE-11560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni reassigned HIVE-11560: --- Assignee: Swarnim Kulkarni Patch for branch-1 -- Key: HIVE-11560 URL: https://issues.apache.org/jira/browse/HIVE-11560 Project: Hive Issue Type: Sub-task Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-11561) Patch for master
[ https://issues.apache.org/jira/browse/HIVE-11561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni reassigned HIVE-11561: --- Assignee: Swarnim Kulkarni Patch for master Key: HIVE-11561 URL: https://issues.apache.org/jira/browse/HIVE-11561 Project: Hive Issue Type: Sub-task Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11560) Patch for branch-1
[ https://issues.apache.org/jira/browse/HIVE-11560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697466#comment-14697466 ] Swarnim Kulkarni commented on HIVE-11560: - [~sershe] Mind reviewing this for me? Patch for branch-1 -- Key: HIVE-11560 URL: https://issues.apache.org/jira/browse/HIVE-11560 Project: Hive Issue Type: Sub-task Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Attachments: HIVE-11560.1.patch.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8973) Add support for pulling HBase columns with regex matching
[ https://issues.apache.org/jira/browse/HIVE-8973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-8973: --- Component/s: HBase Handler Add support for pulling HBase columns with regex matching - Key: HIVE-8973 URL: https://issues.apache.org/jira/browse/HIVE-8973 Project: Hive Issue Type: Improvement Components: Clients, HBase Handler Reporter: Sucaz Moshe Priority: Trivial Hi, we would like to create table that pulling HBase columns with regex matching. for example: CREATE EXTERNAL TABLE XXX( key string , DATES MAPSTRING, BIGINT , FLOATS MAPSTRING, DOUBLE , STRINGS MAPSTRING, STRING ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( hbase.columns.mapping = :key, FECF:D_[0-9]*, FECF:F_[0-9]*, FECF:C[0-9]*_[0-9]*) TBLPROPERTIES (hbase.table.name = XXX, hbase.table.default.storage.type = binary); currently only prefix work (with hive 0.12.0): CREATE EXTERNAL TABLE XXX( key string , DATES MAPSTRING, BIGINT , FLOATS MAPSTRING, DOUBLE , STRINGS MAPSTRING, STRING ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( hbase.columns.mapping = :key, FECF:D_.*, FECF:F_.*, FECF:C.*, hbase.table.default.storage.type = binary) TBLPROPERTIES (hbase.table.name = XXX); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11539) Fix flaky hive tests
[ https://issues.apache.org/jira/browse/HIVE-11539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695235#comment-14695235 ] Swarnim Kulkarni commented on HIVE-11539: - Hm. This seems like a little bit more work than expected. Mostly because the transaction table are scattered all over the place. We should be able to refactor some of this and channel all of it through TxnDbUtil class to get consistency. Fix flaky hive tests Key: HIVE-11539 URL: https://issues.apache.org/jira/browse/HIVE-11539 Project: Hive Issue Type: Bug Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Attachments: HIVE-11539.1.patch.txt Following tests seem flaky and should be fixed. {noformat} org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler org.apache.hive.hcatalog.streaming.TestStreaming.testAddPartition {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8898) Remove HIVE-8874 once HBASE-12493 is fixed
[ https://issues.apache.org/jira/browse/HIVE-8898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693618#comment-14693618 ] Swarnim Kulkarni commented on HIVE-8898: It seems like we might have introduced a regression[1] with this patch. I am digging into what might have caused this and how we can fix this. [1] https://issues.apache.org/jira/browse/HIVE-8898 Remove HIVE-8874 once HBASE-12493 is fixed -- Key: HIVE-8898 URL: https://issues.apache.org/jira/browse/HIVE-8898 Project: Hive Issue Type: Task Components: HBase Handler Reporter: Brock Noland Assignee: Yongzhi Chen Priority: Blocker Fix For: 1.2.0 Attachments: HIVE-8898.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10990) Compatibility Hive-1.2 an hbase-1.0.1.1
[ https://issues.apache.org/jira/browse/HIVE-10990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693672#comment-14693672 ] Swarnim Kulkarni commented on HIVE-10990: - [~gurmukhd] Digging deeper I am unfortunately not seeing anything non-passive between the two versions which might be causing this error. The HBaseStorageHandler makes a call to the HTableDescriptor#addFamily[1][2]. Would you be able to provide me with a full stack trace so I can dig deeper into this/ That said, we still need to do a full scale compatibility testing with hbase 1.0. Stay tuned. [1] https://github.com/apache/hive/blob/release-1.2.0/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java#L214 [2] https://github.com/apache/hbase/blob/1.0.1/hbase-client/src/main/java/org/apache/hadoop/hbase/HTableDescriptor.java#L786 Compatibility Hive-1.2 an hbase-1.0.1.1 --- Key: HIVE-10990 URL: https://issues.apache.org/jira/browse/HIVE-10990 Project: Hive Issue Type: Bug Components: Beeline, HBase Handler, HiveServer2 Affects Versions: 1.2.0 Reporter: gurmukh singh Assignee: Swarnim Kulkarni Hive external table works fine with Hbase. Hive-1.2 and hbase-1.0.1.1, hadoop-2.5.2 Not able to create a table from hive in hbase. 1: jdbc:hive2://edge1.dilithium.com:1/def TBLPROPERTIES (hbase.table.name = xyz); FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V (state=08S01,code=1) [hdfs@edge1 cluster]$ hive 2015-06-12 17:56:49,952 WARN [main] conf.HiveConf: HiveConf of name hive.metastore.local does not exist Logging initialized using configuration in jar:file:/usr/local/cluster/apache-hive-1.2.0-bin/lib/hive-common-1.2.0.jar!/hive-log4j.properties SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/cluster/apache-hive-1.2.0-bin/auxlib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/cluster/hadoop-2.5.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] hive CREATE TABLE hbase_table_1(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf1:val) TBLPROPERTIES (hbase.table.name = xyz); FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V === scan complete in 1535ms 14 driver classes found Compliant Version Driver Class no5.1 com.mysql.jdbc.Driver no5.1 com.mysql.jdbc.NonRegisteringDriver no5.1 com.mysql.jdbc.NonRegisteringReplicationDriver no5.1 com.mysql.jdbc.ReplicationDriver yes 1.2 org.apache.calcite.avatica.remote.Driver yes 1.2 org.apache.calcite.jdbc.Driver yes 1.0 org.apache.commons.dbcp.PoolingDriver yes 10.11 org.apache.derby.jdbc.AutoloadedDriver yes 10.11 org.apache.derby.jdbc.Driver42 yes 10.11 org.apache.derby.jdbc.EmbeddedDriver yes 10.11 org.apache.derby.jdbc.InternalDriver no1.2 org.apache.hive.jdbc.HiveDriver yes 1.0 org.datanucleus.store.rdbms.datasource.dbcp.PoolingDriver no5.1 org.gjt.mm.mysql.Driver -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-5233) move hbase storage handler to org.apache.hcatalog package
[ https://issues.apache.org/jira/browse/HIVE-5233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-5233: --- Component/s: HBase Handler move hbase storage handler to org.apache.hcatalog package - Key: HIVE-5233 URL: https://issues.apache.org/jira/browse/HIVE-5233 Project: Hive Issue Type: Sub-task Components: HBase Handler, HCatalog Affects Versions: 0.12.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.12.0 Attachments: 5233.move, 5233.update, HIVE-5233.patch org.apache.hcatalog in hcatalog/storage-handlers/ was erroneously renamed to org.apache.hive.hcatalog in HIVE-4895. This should be reverted as this module is deprecated and should continue to exist in org.apache.hcatalog. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-791) udf to lookup hbase table
[ https://issues.apache.org/jira/browse/HIVE-791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693708#comment-14693708 ] Swarnim Kulkarni commented on HIVE-791: --- Please refer to the external table support to retrieve data from an existing HBase table. Wiki docs are here: * [LanguageManual DDL -- External Tables | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ExternalTables] * [Design Docs -- Storage Handlers | https://cwiki.apache.org/confluence/display/Hive/StorageHandlers] udf to lookup hbase table - Key: HIVE-791 URL: https://issues.apache.org/jira/browse/HIVE-791 Project: Hive Issue Type: New Feature Components: HBase Handler Reporter: Raghotham Murthy Assignee: John Sichi Retrieve the latest value: hbase_get('hbase_table_name', rowid, columnFamily, columnIdentifier) Retrieve for a specific timestamp: hbase_get('hbase_table_name', rowid, columnFamily, columnIdentifier, ts) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-791) udf to lookup hbase table
[ https://issues.apache.org/jira/browse/HIVE-791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni resolved HIVE-791. --- Resolution: Won't Fix udf to lookup hbase table - Key: HIVE-791 URL: https://issues.apache.org/jira/browse/HIVE-791 Project: Hive Issue Type: New Feature Components: HBase Handler Reporter: Raghotham Murthy Assignee: John Sichi Retrieve the latest value: hbase_get('hbase_table_name', rowid, columnFamily, columnIdentifier) Retrieve for a specific timestamp: hbase_get('hbase_table_name', rowid, columnFamily, columnIdentifier, ts) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-791) udf to lookup hbase table
[ https://issues.apache.org/jira/browse/HIVE-791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693709#comment-14693709 ] Swarnim Kulkarni commented on HIVE-791: --- Resolving as Won't Fix. udf to lookup hbase table - Key: HIVE-791 URL: https://issues.apache.org/jira/browse/HIVE-791 Project: Hive Issue Type: New Feature Components: HBase Handler Reporter: Raghotham Murthy Assignee: John Sichi Retrieve the latest value: hbase_get('hbase_table_name', rowid, columnFamily, columnIdentifier) Retrieve for a specific timestamp: hbase_get('hbase_table_name', rowid, columnFamily, columnIdentifier, ts) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11513) AvroLazyObjectInspector could handle empty data better
[ https://issues.apache.org/jira/browse/HIVE-11513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694303#comment-14694303 ] Swarnim Kulkarni commented on HIVE-11513: - The test failures are unrelated. Logged [1] to look at why they are flaky. [1] https://issues.apache.org/jira/browse/HIVE-11539 AvroLazyObjectInspector could handle empty data better -- Key: HIVE-11513 URL: https://issues.apache.org/jira/browse/HIVE-11513 Project: Hive Issue Type: Improvement Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Attachments: HIVE-11513.1.patch.txt, HIVE-11513.2.patch.txt, HIVE-11513.3.patch.txt Currently in the AvroLazyObjectInspector, it looks like we only handle the case when the data send to deserialize is null[1]. It would be nice to handle the case when it is empty. [1] https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroLazyObjectInspector.java#L226-L228 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11539) Fix flaky hive tests
[ https://issues.apache.org/jira/browse/HIVE-11539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-11539: Summary: Fix flaky hive tests (was: Fix flaky hive test) Fix flaky hive tests Key: HIVE-11539 URL: https://issues.apache.org/jira/browse/HIVE-11539 Project: Hive Issue Type: Bug Reporter: Swarnim Kulkarni Following tests seem flaky and should be fixed. {noformat} org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler org.apache.hive.hcatalog.streaming.TestStreaming.testAddPartition {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8898) Remove HIVE-8874 once HBASE-12493 is fixed
[ https://issues.apache.org/jira/browse/HIVE-8898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694258#comment-14694258 ] Swarnim Kulkarni commented on HIVE-8898: Damn lol. [1] is the JIRA. I did dig into it a bit more and commented on it and seems like the regression might not be as bad as I thought it would be. Regardless I am still of the opinion of reverting this and moving this over to 2.x. [1] https://issues.apache.org/jira/browse/HIVE-10990 Remove HIVE-8874 once HBASE-12493 is fixed -- Key: HIVE-8898 URL: https://issues.apache.org/jira/browse/HIVE-8898 Project: Hive Issue Type: Task Components: HBase Handler Reporter: Brock Noland Assignee: Yongzhi Chen Priority: Blocker Fix For: 1.2.0 Attachments: HIVE-8898.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11539) Fix flaky hive tests
[ https://issues.apache.org/jira/browse/HIVE-11539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-11539: Attachment: HIVE-11539.1.patch.txt Saw this tests failure for TestStreaming test on bunch of other JIRAs. Hopefully this patch should fix all that. Fix flaky hive tests Key: HIVE-11539 URL: https://issues.apache.org/jira/browse/HIVE-11539 Project: Hive Issue Type: Bug Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Attachments: HIVE-11539.1.patch.txt Following tests seem flaky and should be fixed. {noformat} org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler org.apache.hive.hcatalog.streaming.TestStreaming.testAddPartition {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11539) Fix flaky hive tests
[ https://issues.apache.org/jira/browse/HIVE-11539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694385#comment-14694385 ] Swarnim Kulkarni commented on HIVE-11539: - It seems like they passed locally. The flakiness is mostly due to the cleanup logic in there. {noformat} --- T E S T S --- Running org.apache.hive.hcatalog.streaming.TestStreaming Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 43.229 sec - in org.apache.hive.hcatalog.streaming.TestStreaming Results : Tests run: 12, Failures: 0, Errors: 0, Skipped: 0 {noformat} Fix flaky hive tests Key: HIVE-11539 URL: https://issues.apache.org/jira/browse/HIVE-11539 Project: Hive Issue Type: Bug Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Following tests seem flaky and should be fixed. {noformat} org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler org.apache.hive.hcatalog.streaming.TestStreaming.testAddPartition {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-11539) Fix flaky hive tests
[ https://issues.apache.org/jira/browse/HIVE-11539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni reassigned HIVE-11539: --- Assignee: Swarnim Kulkarni Fix flaky hive tests Key: HIVE-11539 URL: https://issues.apache.org/jira/browse/HIVE-11539 Project: Hive Issue Type: Bug Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Following tests seem flaky and should be fixed. {noformat} org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler org.apache.hive.hcatalog.streaming.TestStreaming.testAddPartition {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11513) AvroLazyObjectInspector could handle empty data better
[ https://issues.apache.org/jira/browse/HIVE-11513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694406#comment-14694406 ] Swarnim Kulkarni commented on HIVE-11513: - Submitted patch on HIVE-11539 to fix the flaky tests. AvroLazyObjectInspector could handle empty data better -- Key: HIVE-11513 URL: https://issues.apache.org/jira/browse/HIVE-11513 Project: Hive Issue Type: Improvement Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Attachments: HIVE-11513.1.patch.txt, HIVE-11513.2.patch.txt, HIVE-11513.3.patch.txt Currently in the AvroLazyObjectInspector, it looks like we only handle the case when the data send to deserialize is null[1]. It would be nice to handle the case when it is empty. [1] https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroLazyObjectInspector.java#L226-L228 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-3983) Select on table with hbase storage handler fails with an SASL error
[ https://issues.apache.org/jira/browse/HIVE-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694736#comment-14694736 ] Swarnim Kulkarni commented on HIVE-3983: [~amalakar] Like I mentioned, I tested this out on a kerberos enabled cluster and was able to create and run queries just fine. This issue should have been fixed with HIVE-8874 where we added the capability to retrieve the auth tokens from HBase. Please feel free to open another one if you continue to see issues. Select on table with hbase storage handler fails with an SASL error --- Key: HIVE-3983 URL: https://issues.apache.org/jira/browse/HIVE-3983 Project: Hive Issue Type: Bug Components: HBase Handler, Security Environment: hive-0.10 hbase-0.94.5.5 hadoop-0.23.3.1 hcatalog-0.5 Reporter: Arup Malakar Assignee: Swarnim Kulkarni The table is created using the following query: {code} CREATE TABLE hbase_table_1(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf1:val) TBLPROPERTIES (hbase.table.name = xyz); {code} Doing a select on the table launches a map-reduce job. But the job fails with the following error: {code} 2013-02-02 01:31:07,500 FATAL [IPC Server handler 3 on 40118] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1348093718159_1501_m_00_0 - exited : java.io.IOException: java.lang.RuntimeException: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'. at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:243) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:522) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.init(MapTask.java:160) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:381) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:334) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152) Caused by: java.lang.RuntimeException: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'. at org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection$1.run(SecureClient.java:242) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.util.Methods.call(Methods.java:37) at org.apache.hadoop.hbase.security.User.call(User.java:590) at org.apache.hadoop.hbase.security.User.access$700(User.java:51) at org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:444) at org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection.handleSaslConnectionFailure(SecureClient.java:203) at org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection.setupIOstreams(SecureClient.java:291) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974) at org.apache.hadoop.hbase.ipc.SecureRpcEngine$Invoker.invoke(SecureRpcEngine.java:104) at $Proxy12.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.SecureRpcEngine.getProxy(SecureRpcEngine.java:146) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1335) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1291) at
[jira] [Commented] (HIVE-8871) Hive Hbase Integration : Support for NULL value columns
[ https://issues.apache.org/jira/browse/HIVE-8871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694725#comment-14694725 ] Swarnim Kulkarni commented on HIVE-8871: [~jasperknulst] If I understand this request right, you are proposing that we represent columns which do not have values as empty jsons? Personally I think that is ok but wouldn't that make it harder to differentiate between columns which do not have any values vs those which have actual empty values? For instance, today you can run a IS NULL query and quickly figure out the columns which do not have any values. Wouldn't figuring things like this become harder with the pattern that you are proposing? Hive Hbase Integration : Support for NULL value columns Key: HIVE-8871 URL: https://issues.apache.org/jira/browse/HIVE-8871 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.10.0 Reporter: Jasper Knulst Labels: features If you map a Hive column to a Hbase CF where the CF only has qualifiers but no values, Hive always outputs ' {} ' for that key. This hides the fact that qualifiers do exist within the CF. As soon as you put a single byte (like a space) as value you'll get a return like this ' {20140911, } in Hive. Since it is a common data modelling technique in Hbase to not use the value (and essentially use the qualifier in a CF as value holder) I think it would be worthwhile to have some support for this in the Hbase handler. A solution could be to show a data structure like CF:qualifier:NULL like this: {20140911,} , where '20140911' is the qualifier and NULL value in Hbase are shown as empty json strings. CREATE EXTERNAL TABLE hb_test ( userhash string, count bigint, dates mapstring,string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( hbase.columns.mapping = :key,SUM:COUNT,DATES:, hbase.table.default.storage.type = binary ) TBLPROPERTIES(hbase.table.name = test); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-3983) Select on table with hbase storage handler fails with an SASL error
[ https://issues.apache.org/jira/browse/HIVE-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni resolved HIVE-3983. Resolution: Duplicate Select on table with hbase storage handler fails with an SASL error --- Key: HIVE-3983 URL: https://issues.apache.org/jira/browse/HIVE-3983 Project: Hive Issue Type: Bug Components: HBase Handler, Security Environment: hive-0.10 hbase-0.94.5.5 hadoop-0.23.3.1 hcatalog-0.5 Reporter: Arup Malakar Assignee: Swarnim Kulkarni The table is created using the following query: {code} CREATE TABLE hbase_table_1(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf1:val) TBLPROPERTIES (hbase.table.name = xyz); {code} Doing a select on the table launches a map-reduce job. But the job fails with the following error: {code} 2013-02-02 01:31:07,500 FATAL [IPC Server handler 3 on 40118] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1348093718159_1501_m_00_0 - exited : java.io.IOException: java.lang.RuntimeException: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'. at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:243) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:522) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.init(MapTask.java:160) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:381) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:334) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152) Caused by: java.lang.RuntimeException: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'. at org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection$1.run(SecureClient.java:242) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.util.Methods.call(Methods.java:37) at org.apache.hadoop.hbase.security.User.call(User.java:590) at org.apache.hadoop.hbase.security.User.access$700(User.java:51) at org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:444) at org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection.handleSaslConnectionFailure(SecureClient.java:203) at org.apache.hadoop.hbase.ipc.SecureClient$SecureConnection.setupIOstreams(SecureClient.java:291) at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1124) at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:974) at org.apache.hadoop.hbase.ipc.SecureRpcEngine$Invoker.invoke(SecureRpcEngine.java:104) at $Proxy12.getProtocolVersion(Unknown Source) at org.apache.hadoop.hbase.ipc.SecureRpcEngine.getProxy(SecureRpcEngine.java:146) at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1335) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1291) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:1278) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:987) at
[jira] [Commented] (HIVE-11537) Branch-1 build is broken
[ https://issues.apache.org/jira/browse/HIVE-11537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693883#comment-14693883 ] Swarnim Kulkarni commented on HIVE-11537: - +1 (NB) Branch-1 build is broken Key: HIVE-11537 URL: https://issues.apache.org/jira/browse/HIVE-11537 Project: Hive Issue Type: Bug Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-11537.01.patch {code} [240,19] cannot find symbol symbol: class Operator location: class org.apache.hadoop.hive.ql.optimizer.StatsOptimizer.MetaDataProcessor [INFO] 3 errors [INFO] - [INFO] {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)