[jira] [Commented] (PIG-1946) HBaseStorage constructor syntax is error prone

2011-07-16 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13066442#comment-13066442
 ] 

Dmitriy V. Ryaboy commented on PIG-1946:


I meant pre--no-prefix/pre.

+1. Will commit to 0.10.

 HBaseStorage constructor syntax is error prone
 --

 Key: PIG-1946
 URL: https://issues.apache.org/jira/browse/PIG-1946
 Project: Pig
  Issue Type: Improvement
Reporter: Bill Graham
Assignee: Bill Graham
 Fix For: 0.10

 Attachments: PIG-1946_1.patch, PIG-1946_2.patch, PIG-1946_3.patch


 Using {{HBaseStorage}} like so seems like a reasonable thing to do, but it 
 will yield unexpected results:
 {code}
 STORE result INTO 'hbase://foo' USING
  org.apache.pig.backend.hadoop.hbase.HBaseStorage(
  'info:first_name, info:last_name');
 {code}
 The problem us that a column named {{info:first_name,}} will be created, with 
 the trailing comma included. I've had numerous developers get tripped up on 
 this issue since everywhere else in Pig variables are separated by commas, so 
 I propose we fix it.
 I propose we trim leading/trailing commas from column names, but I'm open to 
 other ideas.
 Also should we accept column names that are comman-delimited without spaces?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1946) HBaseStorage constructor syntax is error prone

2011-07-15 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13065737#comment-13065737
 ] 

Dmitriy V. Ryaboy commented on PIG-1946:


Looks good. The test passed. The purist in me wants to make you do 
true.equalsIgnoreCase(value) instead of (value == null || 
!value.equalsIgnoreCase(true)), and to pull out the column parsing behavior 
into its own function.

You generated the patch from git -- that's not friendly to the automated patch 
machinery; you have to use {code}git diff --no-patch{code} to generate legit 
patches.

If you have time to make the changes, that would be awesome. If not I will try 
to fit it in.

Thanks for the work on this!

 HBaseStorage constructor syntax is error prone
 --

 Key: PIG-1946
 URL: https://issues.apache.org/jira/browse/PIG-1946
 Project: Pig
  Issue Type: Improvement
Reporter: Bill Graham
Assignee: Bill Graham
 Fix For: 0.10

 Attachments: PIG-1946_1.patch, PIG-1946_2.patch


 Using {{HBaseStorage}} like so seems like a reasonable thing to do, but it 
 will yield unexpected results:
 {code}
 STORE result INTO 'hbase://foo' USING
  org.apache.pig.backend.hadoop.hbase.HBaseStorage(
  'info:first_name, info:last_name');
 {code}
 The problem us that a column named {{info:first_name,}} will be created, with 
 the trailing comma included. I've had numerous developers get tripped up on 
 this issue since everywhere else in Pig variables are separated by commas, so 
 I propose we fix it.
 I propose we trim leading/trailing commas from column names, but I'm open to 
 other ideas.
 Also should we accept column names that are comman-delimited without spaces?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1946) HBaseStorage constructor syntax is error prone

2011-07-13 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13064956#comment-13064956
 ] 

Thejas M Nair commented on PIG-1946:


bq. +1 on that last suggestion, sounds good to me. I'll work on the patch.
Removing the patch-available state as Bill is working on a new patch. 
patch-available state is being used for finding jira's that are ready for 
review. 

 HBaseStorage constructor syntax is error prone
 --

 Key: PIG-1946
 URL: https://issues.apache.org/jira/browse/PIG-1946
 Project: Pig
  Issue Type: Improvement
Reporter: Bill Graham
Assignee: Bill Graham
 Fix For: 0.10

 Attachments: PIG-1946_1.patch


 Using {{HBaseStorage}} like so seems like a reasonable thing to do, but it 
 will yield unexpected results:
 {code}
 STORE result INTO 'hbase://foo' USING
  org.apache.pig.backend.hadoop.hbase.HBaseStorage(
  'info:first_name, info:last_name');
 {code}
 The problem us that a column named {{info:first_name,}} will be created, with 
 the trailing comma included. I've had numerous developers get tripped up on 
 this issue since everywhere else in Pig variables are separated by commas, so 
 I propose we fix it.
 I propose we trim leading/trailing commas from column names, but I'm open to 
 other ideas.
 Also should we accept column names that are comman-delimited without spaces?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1946) HBaseStorage constructor syntax is error prone

2011-06-08 Thread Bill Graham (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046262#comment-13046262
 ] 

Bill Graham commented on PIG-1946:
--

+1 on that last suggestion, sounds good to me. I'll work on the patch.

 HBaseStorage constructor syntax is error prone
 --

 Key: PIG-1946
 URL: https://issues.apache.org/jira/browse/PIG-1946
 Project: Pig
  Issue Type: Improvement
Reporter: Bill Graham
Assignee: Bill Graham
 Fix For: 0.10

 Attachments: PIG-1946_1.patch


 Using {{HBaseStorage}} like so seems like a reasonable thing to do, but it 
 will yield unexpected results:
 {code}
 STORE result INTO 'hbase://foo' USING
  org.apache.pig.backend.hadoop.hbase.HBaseStorage(
  'info:first_name, info:last_name');
 {code}
 The problem us that a column named {{info:first_name,}} will be created, with 
 the trailing comma included. I've had numerous developers get tripped up on 
 this issue since everywhere else in Pig variables are separated by commas, so 
 I propose we fix it.
 I propose we trim leading/trailing commas from column names, but I'm open to 
 other ideas.
 Also should we accept column names that are comman-delimited without spaces?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1946) HBaseStorage constructor syntax is error prone

2011-06-06 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044955#comment-13044955
 ] 

Eric Yang commented on PIG-1946:


Take it one step further, what does -constructorDelimiter look like with space 
delimiter?

 HBaseStorage constructor syntax is error prone
 --

 Key: PIG-1946
 URL: https://issues.apache.org/jira/browse/PIG-1946
 Project: Pig
  Issue Type: Improvement
Reporter: Bill Graham
Assignee: Bill Graham
 Fix For: 0.10

 Attachments: PIG-1946_1.patch


 Using {{HBaseStorage}} like so seems like a reasonable thing to do, but it 
 will yield unexpected results:
 {code}
 STORE result INTO 'hbase://foo' USING
  org.apache.pig.backend.hadoop.hbase.HBaseStorage(
  'info:first_name, info:last_name');
 {code}
 The problem us that a column named {{info:first_name,}} will be created, with 
 the trailing comma included. I've had numerous developers get tripped up on 
 this issue since everywhere else in Pig variables are separated by commas, so 
 I propose we fix it.
 I propose we trim leading/trailing commas from column names, but I'm open to 
 other ideas.
 Also should we accept column names that are comman-delimited without spaces?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1946) HBaseStorage constructor syntax is error prone

2011-06-06 Thread Bill Graham (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044969#comment-13044969
 ] 

Bill Graham commented on PIG-1946:
--

I was thinking any combination of comma and space would always be used to 
delimit by default. If -constructorDelimiter is specified we can either say: 
a.) same logic but sub comma out for -constructorDelimiter; or b.) Only use 
-constructorDelimiter as the delimiter in which case spaces would be part of 
the column descriptor if found.

I lean towards a.) but if allowing people to use spaces in column names is 
valid, then we should do b.).

Thoughts?


 HBaseStorage constructor syntax is error prone
 --

 Key: PIG-1946
 URL: https://issues.apache.org/jira/browse/PIG-1946
 Project: Pig
  Issue Type: Improvement
Reporter: Bill Graham
Assignee: Bill Graham
 Fix For: 0.10

 Attachments: PIG-1946_1.patch


 Using {{HBaseStorage}} like so seems like a reasonable thing to do, but it 
 will yield unexpected results:
 {code}
 STORE result INTO 'hbase://foo' USING
  org.apache.pig.backend.hadoop.hbase.HBaseStorage(
  'info:first_name, info:last_name');
 {code}
 The problem us that a column named {{info:first_name,}} will be created, with 
 the trailing comma included. I've had numerous developers get tripped up on 
 this issue since everywhere else in Pig variables are separated by commas, so 
 I propose we fix it.
 I propose we trim leading/trailing commas from column names, but I'm open to 
 other ideas.
 Also should we accept column names that are comman-delimited without spaces?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1946) HBaseStorage constructor syntax is error prone

2011-06-06 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044980#comment-13044980
 ] 

Eric Yang commented on PIG-1946:


a) should be the sensible thing to do.  b) is unlikely to mix well with rest of 
pig syntax anyways.

 HBaseStorage constructor syntax is error prone
 --

 Key: PIG-1946
 URL: https://issues.apache.org/jira/browse/PIG-1946
 Project: Pig
  Issue Type: Improvement
Reporter: Bill Graham
Assignee: Bill Graham
 Fix For: 0.10

 Attachments: PIG-1946_1.patch


 Using {{HBaseStorage}} like so seems like a reasonable thing to do, but it 
 will yield unexpected results:
 {code}
 STORE result INTO 'hbase://foo' USING
  org.apache.pig.backend.hadoop.hbase.HBaseStorage(
  'info:first_name, info:last_name');
 {code}
 The problem us that a column named {{info:first_name,}} will be created, with 
 the trailing comma included. I've had numerous developers get tripped up on 
 this issue since everywhere else in Pig variables are separated by commas, so 
 I propose we fix it.
 I propose we trim leading/trailing commas from column names, but I'm open to 
 other ideas.
 Also should we accept column names that are comman-delimited without spaces?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1946) HBaseStorage constructor syntax is error prone

2011-06-05 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044664#comment-13044664
 ] 

Dmitriy V. Ryaboy commented on PIG-1946:


Coming back to this after a long break..
We can do the standard backslash escaping (with \\ meaning a real backslash) 
but that will look horrible in Pig code, where you have to escape each 
backslash for the parser first.

Maybe an optional -literalColumns flag that makes us revert to 
space-delimited column names? 

I don't really see an elegant way out of this right now.

 HBaseStorage constructor syntax is error prone
 --

 Key: PIG-1946
 URL: https://issues.apache.org/jira/browse/PIG-1946
 Project: Pig
  Issue Type: Improvement
Reporter: Bill Graham
Assignee: Bill Graham
 Fix For: 0.10

 Attachments: PIG-1946_1.patch


 Using {{HBaseStorage}} like so seems like a reasonable thing to do, but it 
 will yield unexpected results:
 {code}
 STORE result INTO 'hbase://foo' USING
  org.apache.pig.backend.hadoop.hbase.HBaseStorage(
  'info:first_name, info:last_name');
 {code}
 The problem us that a column named {{info:first_name,}} will be created, with 
 the trailing comma included. I've had numerous developers get tripped up on 
 this issue since everywhere else in Pig variables are separated by commas, so 
 I propose we fix it.
 I propose we trim leading/trailing commas from column names, but I'm open to 
 other ideas.
 Also should we accept column names that are comman-delimited without spaces?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1946) HBaseStorage constructor syntax is error prone

2011-06-05 Thread Bill Graham (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044690#comment-13044690
 ] 

Bill Graham commented on PIG-1946:
--

How about we say that commas are the default delimiter (with or without 
spaces), but this can be overriden with the -constructorDelimiter param. That 
would allow those pesky comma users to specify some other character as their 
delimiter. Kinda like sed.

 HBaseStorage constructor syntax is error prone
 --

 Key: PIG-1946
 URL: https://issues.apache.org/jira/browse/PIG-1946
 Project: Pig
  Issue Type: Improvement
Reporter: Bill Graham
Assignee: Bill Graham
 Fix For: 0.10

 Attachments: PIG-1946_1.patch


 Using {{HBaseStorage}} like so seems like a reasonable thing to do, but it 
 will yield unexpected results:
 {code}
 STORE result INTO 'hbase://foo' USING
  org.apache.pig.backend.hadoop.hbase.HBaseStorage(
  'info:first_name, info:last_name');
 {code}
 The problem us that a column named {{info:first_name,}} will be created, with 
 the trailing comma included. I've had numerous developers get tripped up on 
 this issue since everywhere else in Pig variables are separated by commas, so 
 I propose we fix it.
 I propose we trim leading/trailing commas from column names, but I'm open to 
 other ideas.
 Also should we accept column names that are comman-delimited without spaces?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1946) HBaseStorage constructor syntax is error prone

2011-05-13 Thread Bill Graham (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033148#comment-13033148
 ] 

Bill Graham commented on PIG-1946:
--

The column descriptors take anything I can throw at them:
{code}
hbase(main):001:0  create 't1', {NAME = 'f1', VERSIONS = 5}  
   
0 row(s) in 0.6400 seconds

hbase(main):002:0 put 't1', 'r1', 'f1:!@#$%)(:+_-=\][{}|;:,./?`~', 'value'  
   
0 row(s) in 0.0660 seconds
{code}

I'm also able to create column families with both '/' and '\' in them. Any 
suggestions for a valid encoding scheme?

 HBaseStorage constructor syntax is error prone
 --

 Key: PIG-1946
 URL: https://issues.apache.org/jira/browse/PIG-1946
 Project: Pig
  Issue Type: Improvement
Reporter: Bill Graham
Assignee: Bill Graham
 Fix For: 0.10

 Attachments: PIG-1946_1.patch


 Using {{HBaseStorage}} like so seems like a reasonable thing to do, but it 
 will yield unexpected results:
 {code}
 STORE result INTO 'hbase://foo' USING
  org.apache.pig.backend.hadoop.hbase.HBaseStorage(
  'info:first_name, info:last_name');
 {code}
 The problem us that a column named {{info:first_name,}} will be created, with 
 the trailing comma included. I've had numerous developers get tripped up on 
 this issue since everywhere else in Pig variables are separated by commas, so 
 I propose we fix it.
 I propose we trim leading/trailing commas from column names, but I'm open to 
 other ideas.
 Also should we accept column names that are comman-delimited without spaces?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1946) HBaseStorage constructor syntax is error prone

2011-05-10 Thread Bill Graham (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13031050#comment-13031050
 ] 

Bill Graham commented on PIG-1946:
--

No problem. Dmitriy, have you found the list of valid characters for column 
descriptors? I feel like I've seen it before somewhere but I can't find it. 
After a quick test, I can write column descriptor with any character I try so 
I'm at a loss for a valid escaping scheme.

 HBaseStorage constructor syntax is error prone
 --

 Key: PIG-1946
 URL: https://issues.apache.org/jira/browse/PIG-1946
 Project: Pig
  Issue Type: Improvement
Reporter: Bill Graham
Assignee: Bill Graham
 Fix For: 0.10

 Attachments: PIG-1946_1.patch


 Using {{HBaseStorage}} like so seems like a reasonable thing to do, but it 
 will yield unexpected results:
 {code}
 STORE result INTO 'hbase://foo' USING
  org.apache.pig.backend.hadoop.hbase.HBaseStorage(
  'info:first_name, info:last_name');
 {code}
 The problem us that a column named {{info:first_name,}} will be created, with 
 the trailing comma included. I've had numerous developers get tripped up on 
 this issue since everywhere else in Pig variables are separated by commas, so 
 I propose we fix it.
 I propose we trim leading/trailing commas from column names, but I'm open to 
 other ideas.
 Also should we accept column names that are comman-delimited without spaces?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1946) HBaseStorage constructor syntax is error prone

2011-05-10 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13031065#comment-13031065
 ] 

Dmitriy V. Ryaboy commented on PIG-1946:


This is for column families: 
http://hbase.apache.org/xref/org/apache/hadoop/hbase/HColumnDescriptor.html#278 
(no slashes, colons, or ISOControl chars, and no starting with .).  I believe 
columns are similar.

 HBaseStorage constructor syntax is error prone
 --

 Key: PIG-1946
 URL: https://issues.apache.org/jira/browse/PIG-1946
 Project: Pig
  Issue Type: Improvement
Reporter: Bill Graham
Assignee: Bill Graham
 Fix For: 0.10

 Attachments: PIG-1946_1.patch


 Using {{HBaseStorage}} like so seems like a reasonable thing to do, but it 
 will yield unexpected results:
 {code}
 STORE result INTO 'hbase://foo' USING
  org.apache.pig.backend.hadoop.hbase.HBaseStorage(
  'info:first_name, info:last_name');
 {code}
 The problem us that a column named {{info:first_name,}} will be created, with 
 the trailing comma included. I've had numerous developers get tripped up on 
 this issue since everywhere else in Pig variables are separated by commas, so 
 I propose we fix it.
 I propose we trim leading/trailing commas from column names, but I'm open to 
 other ideas.
 Also should we accept column names that are comman-delimited without spaces?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1946) HBaseStorage constructor syntax is error prone

2011-05-09 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13031006#comment-13031006
 ] 

Dmitriy V. Ryaboy commented on PIG-1946:


Bill, sorry it took me a while to get to this. Looks good, but I just confirmed 
that commas are valid in column names.. we should add escaping.

 HBaseStorage constructor syntax is error prone
 --

 Key: PIG-1946
 URL: https://issues.apache.org/jira/browse/PIG-1946
 Project: Pig
  Issue Type: Improvement
Reporter: Bill Graham
Assignee: Bill Graham
 Fix For: 0.10

 Attachments: PIG-1946_1.patch


 Using {{HBaseStorage}} like so seems like a reasonable thing to do, but it 
 will yield unexpected results:
 {code}
 STORE result INTO 'hbase://foo' USING
  org.apache.pig.backend.hadoop.hbase.HBaseStorage(
  'info:first_name, info:last_name');
 {code}
 The problem us that a column named {{info:first_name,}} will be created, with 
 the trailing comma included. I've had numerous developers get tripped up on 
 this issue since everywhere else in Pig variables are separated by commas, so 
 I propose we fix it.
 I propose we trim leading/trailing commas from column names, but I'm open to 
 other ideas.
 Also should we accept column names that are comman-delimited without spaces?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1946) HBaseStorage constructor syntax is error prone

2011-03-30 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013779#comment-13013779
 ] 

Eric Yang commented on PIG-1946:


An alternative is to modify the syntax like:

{noformat}
STORE result INTO 'hbase://foo' USING
 org.apache.pig.backend.hadoop.hbase.HBaseStorage(
 'info:first_name info:last_name');
{noformat}

Eliminate comma from the syntax completely.  It may have some readability issue 
with this approach.

Having that said, the problem can be solved in more user friendly manner by 
improving the parser to filter prefix and suffix of comma, and comma only cases.

Maybe this issue can be defined more accurately with: Improve syntax parsing 
for HBaseStorage constructor.

 HBaseStorage constructor syntax is error prone
 --

 Key: PIG-1946
 URL: https://issues.apache.org/jira/browse/PIG-1946
 Project: Pig
  Issue Type: Improvement
Reporter: Bill Graham
Assignee: Bill Graham

 Using {{HBaseStorage}} like so seems like a reasonable thing to do, but it 
 will yield unexpected results:
 {code}
 STORE result INTO 'hbase://foo' USING
  org.apache.pig.backend.hadoop.hbase.HBaseStorage(
  'info:first_name, info:last_name');
 {code}
 The problem us that a column named {{info:first_name,}} will be created, with 
 the trailing comma included. I've had numerous developers get tripped up on 
 this issue since everywhere else in Pig variables are separated by commas, so 
 I propose we fix it.
 I propose we trim leading/trailing commas from column names, but I'm open to 
 other ideas.
 Also should we accept column names that are comman-delimited without spaces?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira