Re: Review Request: Add support for pulling HBase columns with prefixes

2013-02-09 Thread Swarnim Kulkarni


 On Feb. 5, 2013, 3:43 a.m., Mark Grover wrote:
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, line 
  192
  https://reviews.apache.org/r/9276/diff/1/?file=254957#file254957line192
 
  This seems like a limited case of pattern matching. Swarnim, any way we 
  can support generic regex matching instead?

Mark, in this case I specifically wanted to only allow strings that end with 
exactly the character * and using String#endsWith seemed more simpler and 
readable than a regex. Do you still want me to replace this with a regex 
matching?


- Swarnim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9276/#review16080
---


On Feb. 3, 2013, 1:04 a.m., Swarnim Kulkarni wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/9276/
 ---
 
 (Updated Feb. 3, 2013, 1:04 a.m.)
 
 
 Review request for hive.
 
 
 Description
 ---
 
 Added support for pulling hbase columns just by providing prefixes and a 
 wildcard. So a query now could look something like this:
 
 CREATE EXTERNAL TABLE hive_hbase_test
 ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' 
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
 WITH SERDEPROPERTIES (hbase.columns.mapping = :key,fam1:col*) 
 TBLPROPERTIES (hbase.table.name = TEST_HBASE_TABLE);
 
 This would pull in all columns under column family fam1 which start with 
 col. This gives a little more flexibility over pull all columns format.
 
 
 This addresses bug HIVE-3725.
 https://issues.apache.org/jira/browse/HIVE-3725
 
 
 Diffs
 -
 
   hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 7f37ba5 
   hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java 
 a8ba9d9 
   hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java 
 d35bb52 
   hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java 
 e821282 
 
 Diff: https://reviews.apache.org/r/9276/diff/
 
 
 Testing
 ---
 
 Added unit tests to demonstrate the new functionality. Also made sure that 
 all existing unit tests passed.
 
 
 Thanks,
 
 Swarnim Kulkarni
 




Re: Review Request: Add support for pulling HBase columns with prefixes

2013-02-09 Thread Brock Noland


 On Feb. 5, 2013, 3:43 a.m., Mark Grover wrote:
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, line 
  192
  https://reviews.apache.org/r/9276/diff/1/?file=254957#file254957line192
 
  This seems like a limited case of pattern matching. Swarnim, any way we 
  can support generic regex matching instead?
 
 Swarnim Kulkarni wrote:
 Mark, in this case I specifically wanted to only allow strings that end 
 with exactly the character * and using String#endsWith seemed more simpler 
 and readable than a regex. Do you still want me to replace this with a regex 
 matching?

I think the issue is that this would make it difficult to implement enhanced 
pattern matching later. Implementing it now, you'd only need to specify:

col.*

in the table configuration. Now the issue would be detecting if the particular 
column was a regex pattern. Because #, comma, and : are used as separators that 
would exclude those characters from being used.


- Brock


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9276/#review16080
---


On Feb. 3, 2013, 1:04 a.m., Swarnim Kulkarni wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/9276/
 ---
 
 (Updated Feb. 3, 2013, 1:04 a.m.)
 
 
 Review request for hive.
 
 
 Description
 ---
 
 Added support for pulling hbase columns just by providing prefixes and a 
 wildcard. So a query now could look something like this:
 
 CREATE EXTERNAL TABLE hive_hbase_test
 ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' 
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
 WITH SERDEPROPERTIES (hbase.columns.mapping = :key,fam1:col*) 
 TBLPROPERTIES (hbase.table.name = TEST_HBASE_TABLE);
 
 This would pull in all columns under column family fam1 which start with 
 col. This gives a little more flexibility over pull all columns format.
 
 
 This addresses bug HIVE-3725.
 https://issues.apache.org/jira/browse/HIVE-3725
 
 
 Diffs
 -
 
   hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 7f37ba5 
   hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java 
 a8ba9d9 
   hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java 
 d35bb52 
   hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java 
 e821282 
 
 Diff: https://reviews.apache.org/r/9276/diff/
 
 
 Testing
 ---
 
 Added unit tests to demonstrate the new functionality. Also made sure that 
 all existing unit tests passed.
 
 
 Thanks,
 
 Swarnim Kulkarni
 




Re: Review Request: Add support for pulling HBase columns with prefixes

2013-02-09 Thread Swarnim Kulkarni


 On Feb. 5, 2013, 3:43 a.m., Mark Grover wrote:
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, line 
  192
  https://reviews.apache.org/r/9276/diff/1/?file=254957#file254957line192
 
  This seems like a limited case of pattern matching. Swarnim, any way we 
  can support generic regex matching instead?
 
 Swarnim Kulkarni wrote:
 Mark, in this case I specifically wanted to only allow strings that end 
 with exactly the character * and using String#endsWith seemed more simpler 
 and readable than a regex. Do you still want me to replace this with a regex 
 matching?
 
 Brock Noland wrote:
 I think the issue is that this would make it difficult to implement 
 enhanced pattern matching later. Implementing it now, you'd only need to 
 specify:
 
 col.*
 
 in the table configuration. Now the issue would be detecting if the 
 particular column was a regex pattern. Because #, comma, and : are used as 
 separators that would exclude those characters from being used.

Thanks Brock. Makes sense. To be sure I am understanding you right, the change 
now would be just to replace the parts[1].endsWith(*) with something more 
regexy that would still imply that the string ends with *. Correct?


- Swarnim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9276/#review16080
---


On Feb. 3, 2013, 1:04 a.m., Swarnim Kulkarni wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/9276/
 ---
 
 (Updated Feb. 3, 2013, 1:04 a.m.)
 
 
 Review request for hive.
 
 
 Description
 ---
 
 Added support for pulling hbase columns just by providing prefixes and a 
 wildcard. So a query now could look something like this:
 
 CREATE EXTERNAL TABLE hive_hbase_test
 ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' 
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
 WITH SERDEPROPERTIES (hbase.columns.mapping = :key,fam1:col*) 
 TBLPROPERTIES (hbase.table.name = TEST_HBASE_TABLE);
 
 This would pull in all columns under column family fam1 which start with 
 col. This gives a little more flexibility over pull all columns format.
 
 
 This addresses bug HIVE-3725.
 https://issues.apache.org/jira/browse/HIVE-3725
 
 
 Diffs
 -
 
   hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 7f37ba5 
   hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java 
 a8ba9d9 
   hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java 
 d35bb52 
   hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java 
 e821282 
 
 Diff: https://reviews.apache.org/r/9276/diff/
 
 
 Testing
 ---
 
 Added unit tests to demonstrate the new functionality. Also made sure that 
 all existing unit tests passed.
 
 
 Thanks,
 
 Swarnim Kulkarni
 




Re: Review Request: Add support for pulling HBase columns with prefixes

2013-02-09 Thread Mark Grover


 On Feb. 5, 2013, 3:43 a.m., Mark Grover wrote:
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, line 
  192
  https://reviews.apache.org/r/9276/diff/1/?file=254957#file254957line192
 
  This seems like a limited case of pattern matching. Swarnim, any way we 
  can support generic regex matching instead?
 
 Swarnim Kulkarni wrote:
 Mark, in this case I specifically wanted to only allow strings that end 
 with exactly the character * and using String#endsWith seemed more simpler 
 and readable than a regex. Do you still want me to replace this with a regex 
 matching?
 
 Brock Noland wrote:
 I think the issue is that this would make it difficult to implement 
 enhanced pattern matching later. Implementing it now, you'd only need to 
 specify:
 
 col.*
 
 in the table configuration. Now the issue would be detecting if the 
 particular column was a regex pattern. Because #, comma, and : are used as 
 separators that would exclude those characters from being used.
 
 Swarnim Kulkarni wrote:
 Thanks Brock. Makes sense. To be sure I am understanding you right, the 
 change now would be just to replace the parts[1].endsWith(*) with something 
 more regexy that would still imply that the string ends with *. Correct?

I think that should be do it.

Personally, I think having limited regex matching is just going to confuse 
people, so if you could implement (and test) full Nava style regex matching 
(like we do for RegexSerDe for example), that would be fantastic. Of course, 
let me know if you have questions!

Thanks for doing this, BTW!


- Mark


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9276/#review16080
---


On Feb. 3, 2013, 1:04 a.m., Swarnim Kulkarni wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/9276/
 ---
 
 (Updated Feb. 3, 2013, 1:04 a.m.)
 
 
 Review request for hive.
 
 
 Description
 ---
 
 Added support for pulling hbase columns just by providing prefixes and a 
 wildcard. So a query now could look something like this:
 
 CREATE EXTERNAL TABLE hive_hbase_test
 ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' 
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
 WITH SERDEPROPERTIES (hbase.columns.mapping = :key,fam1:col*) 
 TBLPROPERTIES (hbase.table.name = TEST_HBASE_TABLE);
 
 This would pull in all columns under column family fam1 which start with 
 col. This gives a little more flexibility over pull all columns format.
 
 
 This addresses bug HIVE-3725.
 https://issues.apache.org/jira/browse/HIVE-3725
 
 
 Diffs
 -
 
   hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 7f37ba5 
   hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java 
 a8ba9d9 
   hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java 
 d35bb52 
   hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java 
 e821282 
 
 Diff: https://reviews.apache.org/r/9276/diff/
 
 
 Testing
 ---
 
 Added unit tests to demonstrate the new functionality. Also made sure that 
 all existing unit tests passed.
 
 
 Thanks,
 
 Swarnim Kulkarni
 




Re: Review Request: Add support for pulling HBase columns with prefixes

2013-02-09 Thread Swarnim Kulkarni

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9276/
---

(Updated Feb. 9, 2013, 9:56 p.m.)


Review request for hive.


Changes
---

Updated diff with the proposed changes.


Description
---

Added support for pulling hbase columns just by providing prefixes and a 
wildcard. So a query now could look something like this:

CREATE EXTERNAL TABLE hive_hbase_test
ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
WITH SERDEPROPERTIES (hbase.columns.mapping = :key,fam1:col*) 
TBLPROPERTIES (hbase.table.name = TEST_HBASE_TABLE);

This would pull in all columns under column family fam1 which start with 
col. This gives a little more flexibility over pull all columns format.


This addresses bug HIVE-3725.
https://issues.apache.org/jira/browse/HIVE-3725


Diffs (updated)
-

  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 7f37ba5 
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java 
a8ba9d9 
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java d35bb52 
  hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java 
e821282 

Diff: https://reviews.apache.org/r/9276/diff/


Testing
---

Added unit tests to demonstrate the new functionality. Also made sure that all 
existing unit tests passed.


Thanks,

Swarnim Kulkarni



Re: Review Request: Add support for pulling HBase columns with prefixes

2013-02-09 Thread Swarnim Kulkarni


 On Feb. 5, 2013, 3:43 a.m., Mark Grover wrote:
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java, line 
  192
  https://reviews.apache.org/r/9276/diff/1/?file=254957#file254957line192
 
  This seems like a limited case of pattern matching. Swarnim, any way we 
  can support generic regex matching instead?
 
 Swarnim Kulkarni wrote:
 Mark, in this case I specifically wanted to only allow strings that end 
 with exactly the character * and using String#endsWith seemed more simpler 
 and readable than a regex. Do you still want me to replace this with a regex 
 matching?
 
 Brock Noland wrote:
 I think the issue is that this would make it difficult to implement 
 enhanced pattern matching later. Implementing it now, you'd only need to 
 specify:
 
 col.*
 
 in the table configuration. Now the issue would be detecting if the 
 particular column was a regex pattern. Because #, comma, and : are used as 
 separators that would exclude those characters from being used.
 
 Swarnim Kulkarni wrote:
 Thanks Brock. Makes sense. To be sure I am understanding you right, the 
 change now would be just to replace the parts[1].endsWith(*) with something 
 more regexy that would still imply that the string ends with *. Correct?
 
 Mark Grover wrote:
 I think that should be do it.
 
 Personally, I think having limited regex matching is just going to 
 confuse people, so if you could implement (and test) full Nava style regex 
 matching (like we do for RegexSerDe for example), that would be fantastic. Of 
 course, let me know if you have questions!
 
 Thanks for doing this, BTW!

Thanks for the suggestions. I incorporated them and updated the review. If you 
get a chance, please let me know if they look any better.


- Swarnim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9276/#review16080
---


On Feb. 9, 2013, 9:56 p.m., Swarnim Kulkarni wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/9276/
 ---
 
 (Updated Feb. 9, 2013, 9:56 p.m.)
 
 
 Review request for hive.
 
 
 Description
 ---
 
 Added support for pulling hbase columns just by providing prefixes and a 
 wildcard. So a query now could look something like this:
 
 CREATE EXTERNAL TABLE hive_hbase_test
 ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' 
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
 WITH SERDEPROPERTIES (hbase.columns.mapping = :key,fam1:col*) 
 TBLPROPERTIES (hbase.table.name = TEST_HBASE_TABLE);
 
 This would pull in all columns under column family fam1 which start with 
 col. This gives a little more flexibility over pull all columns format.
 
 
 This addresses bug HIVE-3725.
 https://issues.apache.org/jira/browse/HIVE-3725
 
 
 Diffs
 -
 
   hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 7f37ba5 
   hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java 
 a8ba9d9 
   hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java 
 d35bb52 
   hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java 
 e821282 
 
 Diff: https://reviews.apache.org/r/9276/diff/
 
 
 Testing
 ---
 
 Added unit tests to demonstrate the new functionality. Also made sure that 
 all existing unit tests passed.
 
 
 Thanks,
 
 Swarnim Kulkarni
 




Re: Review Request: Add support for pulling HBase columns with prefixes

2013-02-04 Thread Mark Grover

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9276/#review16080
---



hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java
https://reviews.apache.org/r/9276/#comment34401

This seems like a limited case of pattern matching. Swarnim, any way we can 
support generic regex matching instead?


- Mark Grover


On Feb. 3, 2013, 1:04 a.m., Swarnim Kulkarni wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/9276/
 ---
 
 (Updated Feb. 3, 2013, 1:04 a.m.)
 
 
 Review request for hive.
 
 
 Description
 ---
 
 Added support for pulling hbase columns just by providing prefixes and a 
 wildcard. So a query now could look something like this:
 
 CREATE EXTERNAL TABLE hive_hbase_test
 ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' 
 STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
 WITH SERDEPROPERTIES (hbase.columns.mapping = :key,fam1:col*) 
 TBLPROPERTIES (hbase.table.name = TEST_HBASE_TABLE);
 
 This would pull in all columns under column family fam1 which start with 
 col. This gives a little more flexibility over pull all columns format.
 
 
 This addresses bug HIVE-3725.
 https://issues.apache.org/jira/browse/HIVE-3725
 
 
 Diffs
 -
 
   hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 7f37ba5 
   hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java 
 a8ba9d9 
   hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java 
 d35bb52 
   hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java 
 e821282 
 
 Diff: https://reviews.apache.org/r/9276/diff/
 
 
 Testing
 ---
 
 Added unit tests to demonstrate the new functionality. Also made sure that 
 all existing unit tests passed.
 
 
 Thanks,
 
 Swarnim Kulkarni
 




Review Request: Add support for pulling HBase columns with prefixes

2013-02-02 Thread Swarnim Kulkarni

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9276/
---

Review request for hive.


Description
---

Added support for pulling hbase columns just by providing prefixes and a 
wildcard. So a query now could look something like this:

CREATE EXTERNAL TABLE hive_hbase_test
ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
WITH SERDEPROPERTIES (hbase.columns.mapping = :key,fam1:col*) 
TBLPROPERTIES (hbase.table.name = TEST_NEW_KEPLER_TABLE);

This would pull in all columns under column family fam1 which start with 
col. This gives a little more flexibility over pull all columns format.


This addresses bug HIVE-3725.
https://issues.apache.org/jira/browse/HIVE-3725


Diffs
-

  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 7f37ba5 
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java 
a8ba9d9 
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java d35bb52 
  hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java 
e821282 

Diff: https://reviews.apache.org/r/9276/diff/


Testing
---

Added unit tests to demonstrate the new functionality. Also made sure that all 
existing unit tests passed.


Thanks,

Swarnim Kulkarni



Re: Review Request: Add support for pulling HBase columns with prefixes

2013-02-02 Thread Swarnim Kulkarni

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9276/
---

(Updated Feb. 3, 2013, 1:04 a.m.)


Review request for hive.


Changes
---

Updated description.


Description (updated)
---

Added support for pulling hbase columns just by providing prefixes and a 
wildcard. So a query now could look something like this:

CREATE EXTERNAL TABLE hive_hbase_test
ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
WITH SERDEPROPERTIES (hbase.columns.mapping = :key,fam1:col*) 
TBLPROPERTIES (hbase.table.name = TEST_HBASE_TABLE);

This would pull in all columns under column family fam1 which start with 
col. This gives a little more flexibility over pull all columns format.


This addresses bug HIVE-3725.
https://issues.apache.org/jira/browse/HIVE-3725


Diffs
-

  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 7f37ba5 
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseCellMap.java 
a8ba9d9 
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java d35bb52 
  hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java 
e821282 

Diff: https://reviews.apache.org/r/9276/diff/


Testing
---

Added unit tests to demonstrate the new functionality. Also made sure that all 
existing unit tests passed.


Thanks,

Swarnim Kulkarni