[jira] [Commented] (PIG-1946) HBaseStorage constructor syntax is error prone
[ https://issues.apache.org/jira/browse/PIG-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13066442#comment-13066442 ] Dmitriy V. Ryaboy commented on PIG-1946: I meant pre--no-prefix/pre. +1. Will commit to 0.10. HBaseStorage constructor syntax is error prone -- Key: PIG-1946 URL: https://issues.apache.org/jira/browse/PIG-1946 Project: Pig Issue Type: Improvement Reporter: Bill Graham Assignee: Bill Graham Fix For: 0.10 Attachments: PIG-1946_1.patch, PIG-1946_2.patch, PIG-1946_3.patch Using {{HBaseStorage}} like so seems like a reasonable thing to do, but it will yield unexpected results: {code} STORE result INTO 'hbase://foo' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( 'info:first_name, info:last_name'); {code} The problem us that a column named {{info:first_name,}} will be created, with the trailing comma included. I've had numerous developers get tripped up on this issue since everywhere else in Pig variables are separated by commas, so I propose we fix it. I propose we trim leading/trailing commas from column names, but I'm open to other ideas. Also should we accept column names that are comman-delimited without spaces? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1946) HBaseStorage constructor syntax is error prone
[ https://issues.apache.org/jira/browse/PIG-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13065737#comment-13065737 ] Dmitriy V. Ryaboy commented on PIG-1946: Looks good. The test passed. The purist in me wants to make you do true.equalsIgnoreCase(value) instead of (value == null || !value.equalsIgnoreCase(true)), and to pull out the column parsing behavior into its own function. You generated the patch from git -- that's not friendly to the automated patch machinery; you have to use {code}git diff --no-patch{code} to generate legit patches. If you have time to make the changes, that would be awesome. If not I will try to fit it in. Thanks for the work on this! HBaseStorage constructor syntax is error prone -- Key: PIG-1946 URL: https://issues.apache.org/jira/browse/PIG-1946 Project: Pig Issue Type: Improvement Reporter: Bill Graham Assignee: Bill Graham Fix For: 0.10 Attachments: PIG-1946_1.patch, PIG-1946_2.patch Using {{HBaseStorage}} like so seems like a reasonable thing to do, but it will yield unexpected results: {code} STORE result INTO 'hbase://foo' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( 'info:first_name, info:last_name'); {code} The problem us that a column named {{info:first_name,}} will be created, with the trailing comma included. I've had numerous developers get tripped up on this issue since everywhere else in Pig variables are separated by commas, so I propose we fix it. I propose we trim leading/trailing commas from column names, but I'm open to other ideas. Also should we accept column names that are comman-delimited without spaces? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1946) HBaseStorage constructor syntax is error prone
[ https://issues.apache.org/jira/browse/PIG-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13064956#comment-13064956 ] Thejas M Nair commented on PIG-1946: bq. +1 on that last suggestion, sounds good to me. I'll work on the patch. Removing the patch-available state as Bill is working on a new patch. patch-available state is being used for finding jira's that are ready for review. HBaseStorage constructor syntax is error prone -- Key: PIG-1946 URL: https://issues.apache.org/jira/browse/PIG-1946 Project: Pig Issue Type: Improvement Reporter: Bill Graham Assignee: Bill Graham Fix For: 0.10 Attachments: PIG-1946_1.patch Using {{HBaseStorage}} like so seems like a reasonable thing to do, but it will yield unexpected results: {code} STORE result INTO 'hbase://foo' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( 'info:first_name, info:last_name'); {code} The problem us that a column named {{info:first_name,}} will be created, with the trailing comma included. I've had numerous developers get tripped up on this issue since everywhere else in Pig variables are separated by commas, so I propose we fix it. I propose we trim leading/trailing commas from column names, but I'm open to other ideas. Also should we accept column names that are comman-delimited without spaces? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1946) HBaseStorage constructor syntax is error prone
[ https://issues.apache.org/jira/browse/PIG-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046262#comment-13046262 ] Bill Graham commented on PIG-1946: -- +1 on that last suggestion, sounds good to me. I'll work on the patch. HBaseStorage constructor syntax is error prone -- Key: PIG-1946 URL: https://issues.apache.org/jira/browse/PIG-1946 Project: Pig Issue Type: Improvement Reporter: Bill Graham Assignee: Bill Graham Fix For: 0.10 Attachments: PIG-1946_1.patch Using {{HBaseStorage}} like so seems like a reasonable thing to do, but it will yield unexpected results: {code} STORE result INTO 'hbase://foo' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( 'info:first_name, info:last_name'); {code} The problem us that a column named {{info:first_name,}} will be created, with the trailing comma included. I've had numerous developers get tripped up on this issue since everywhere else in Pig variables are separated by commas, so I propose we fix it. I propose we trim leading/trailing commas from column names, but I'm open to other ideas. Also should we accept column names that are comman-delimited without spaces? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1946) HBaseStorage constructor syntax is error prone
[ https://issues.apache.org/jira/browse/PIG-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044955#comment-13044955 ] Eric Yang commented on PIG-1946: Take it one step further, what does -constructorDelimiter look like with space delimiter? HBaseStorage constructor syntax is error prone -- Key: PIG-1946 URL: https://issues.apache.org/jira/browse/PIG-1946 Project: Pig Issue Type: Improvement Reporter: Bill Graham Assignee: Bill Graham Fix For: 0.10 Attachments: PIG-1946_1.patch Using {{HBaseStorage}} like so seems like a reasonable thing to do, but it will yield unexpected results: {code} STORE result INTO 'hbase://foo' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( 'info:first_name, info:last_name'); {code} The problem us that a column named {{info:first_name,}} will be created, with the trailing comma included. I've had numerous developers get tripped up on this issue since everywhere else in Pig variables are separated by commas, so I propose we fix it. I propose we trim leading/trailing commas from column names, but I'm open to other ideas. Also should we accept column names that are comman-delimited without spaces? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1946) HBaseStorage constructor syntax is error prone
[ https://issues.apache.org/jira/browse/PIG-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044969#comment-13044969 ] Bill Graham commented on PIG-1946: -- I was thinking any combination of comma and space would always be used to delimit by default. If -constructorDelimiter is specified we can either say: a.) same logic but sub comma out for -constructorDelimiter; or b.) Only use -constructorDelimiter as the delimiter in which case spaces would be part of the column descriptor if found. I lean towards a.) but if allowing people to use spaces in column names is valid, then we should do b.). Thoughts? HBaseStorage constructor syntax is error prone -- Key: PIG-1946 URL: https://issues.apache.org/jira/browse/PIG-1946 Project: Pig Issue Type: Improvement Reporter: Bill Graham Assignee: Bill Graham Fix For: 0.10 Attachments: PIG-1946_1.patch Using {{HBaseStorage}} like so seems like a reasonable thing to do, but it will yield unexpected results: {code} STORE result INTO 'hbase://foo' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( 'info:first_name, info:last_name'); {code} The problem us that a column named {{info:first_name,}} will be created, with the trailing comma included. I've had numerous developers get tripped up on this issue since everywhere else in Pig variables are separated by commas, so I propose we fix it. I propose we trim leading/trailing commas from column names, but I'm open to other ideas. Also should we accept column names that are comman-delimited without spaces? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1946) HBaseStorage constructor syntax is error prone
[ https://issues.apache.org/jira/browse/PIG-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044980#comment-13044980 ] Eric Yang commented on PIG-1946: a) should be the sensible thing to do. b) is unlikely to mix well with rest of pig syntax anyways. HBaseStorage constructor syntax is error prone -- Key: PIG-1946 URL: https://issues.apache.org/jira/browse/PIG-1946 Project: Pig Issue Type: Improvement Reporter: Bill Graham Assignee: Bill Graham Fix For: 0.10 Attachments: PIG-1946_1.patch Using {{HBaseStorage}} like so seems like a reasonable thing to do, but it will yield unexpected results: {code} STORE result INTO 'hbase://foo' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( 'info:first_name, info:last_name'); {code} The problem us that a column named {{info:first_name,}} will be created, with the trailing comma included. I've had numerous developers get tripped up on this issue since everywhere else in Pig variables are separated by commas, so I propose we fix it. I propose we trim leading/trailing commas from column names, but I'm open to other ideas. Also should we accept column names that are comman-delimited without spaces? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1946) HBaseStorage constructor syntax is error prone
[ https://issues.apache.org/jira/browse/PIG-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044664#comment-13044664 ] Dmitriy V. Ryaboy commented on PIG-1946: Coming back to this after a long break.. We can do the standard backslash escaping (with \\ meaning a real backslash) but that will look horrible in Pig code, where you have to escape each backslash for the parser first. Maybe an optional -literalColumns flag that makes us revert to space-delimited column names? I don't really see an elegant way out of this right now. HBaseStorage constructor syntax is error prone -- Key: PIG-1946 URL: https://issues.apache.org/jira/browse/PIG-1946 Project: Pig Issue Type: Improvement Reporter: Bill Graham Assignee: Bill Graham Fix For: 0.10 Attachments: PIG-1946_1.patch Using {{HBaseStorage}} like so seems like a reasonable thing to do, but it will yield unexpected results: {code} STORE result INTO 'hbase://foo' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( 'info:first_name, info:last_name'); {code} The problem us that a column named {{info:first_name,}} will be created, with the trailing comma included. I've had numerous developers get tripped up on this issue since everywhere else in Pig variables are separated by commas, so I propose we fix it. I propose we trim leading/trailing commas from column names, but I'm open to other ideas. Also should we accept column names that are comman-delimited without spaces? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1946) HBaseStorage constructor syntax is error prone
[ https://issues.apache.org/jira/browse/PIG-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044690#comment-13044690 ] Bill Graham commented on PIG-1946: -- How about we say that commas are the default delimiter (with or without spaces), but this can be overriden with the -constructorDelimiter param. That would allow those pesky comma users to specify some other character as their delimiter. Kinda like sed. HBaseStorage constructor syntax is error prone -- Key: PIG-1946 URL: https://issues.apache.org/jira/browse/PIG-1946 Project: Pig Issue Type: Improvement Reporter: Bill Graham Assignee: Bill Graham Fix For: 0.10 Attachments: PIG-1946_1.patch Using {{HBaseStorage}} like so seems like a reasonable thing to do, but it will yield unexpected results: {code} STORE result INTO 'hbase://foo' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( 'info:first_name, info:last_name'); {code} The problem us that a column named {{info:first_name,}} will be created, with the trailing comma included. I've had numerous developers get tripped up on this issue since everywhere else in Pig variables are separated by commas, so I propose we fix it. I propose we trim leading/trailing commas from column names, but I'm open to other ideas. Also should we accept column names that are comman-delimited without spaces? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1946) HBaseStorage constructor syntax is error prone
[ https://issues.apache.org/jira/browse/PIG-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033148#comment-13033148 ] Bill Graham commented on PIG-1946: -- The column descriptors take anything I can throw at them: {code} hbase(main):001:0 create 't1', {NAME = 'f1', VERSIONS = 5} 0 row(s) in 0.6400 seconds hbase(main):002:0 put 't1', 'r1', 'f1:!@#$%)(:+_-=\][{}|;:,./?`~', 'value' 0 row(s) in 0.0660 seconds {code} I'm also able to create column families with both '/' and '\' in them. Any suggestions for a valid encoding scheme? HBaseStorage constructor syntax is error prone -- Key: PIG-1946 URL: https://issues.apache.org/jira/browse/PIG-1946 Project: Pig Issue Type: Improvement Reporter: Bill Graham Assignee: Bill Graham Fix For: 0.10 Attachments: PIG-1946_1.patch Using {{HBaseStorage}} like so seems like a reasonable thing to do, but it will yield unexpected results: {code} STORE result INTO 'hbase://foo' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( 'info:first_name, info:last_name'); {code} The problem us that a column named {{info:first_name,}} will be created, with the trailing comma included. I've had numerous developers get tripped up on this issue since everywhere else in Pig variables are separated by commas, so I propose we fix it. I propose we trim leading/trailing commas from column names, but I'm open to other ideas. Also should we accept column names that are comman-delimited without spaces? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1946) HBaseStorage constructor syntax is error prone
[ https://issues.apache.org/jira/browse/PIG-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13031050#comment-13031050 ] Bill Graham commented on PIG-1946: -- No problem. Dmitriy, have you found the list of valid characters for column descriptors? I feel like I've seen it before somewhere but I can't find it. After a quick test, I can write column descriptor with any character I try so I'm at a loss for a valid escaping scheme. HBaseStorage constructor syntax is error prone -- Key: PIG-1946 URL: https://issues.apache.org/jira/browse/PIG-1946 Project: Pig Issue Type: Improvement Reporter: Bill Graham Assignee: Bill Graham Fix For: 0.10 Attachments: PIG-1946_1.patch Using {{HBaseStorage}} like so seems like a reasonable thing to do, but it will yield unexpected results: {code} STORE result INTO 'hbase://foo' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( 'info:first_name, info:last_name'); {code} The problem us that a column named {{info:first_name,}} will be created, with the trailing comma included. I've had numerous developers get tripped up on this issue since everywhere else in Pig variables are separated by commas, so I propose we fix it. I propose we trim leading/trailing commas from column names, but I'm open to other ideas. Also should we accept column names that are comman-delimited without spaces? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1946) HBaseStorage constructor syntax is error prone
[ https://issues.apache.org/jira/browse/PIG-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13031065#comment-13031065 ] Dmitriy V. Ryaboy commented on PIG-1946: This is for column families: http://hbase.apache.org/xref/org/apache/hadoop/hbase/HColumnDescriptor.html#278 (no slashes, colons, or ISOControl chars, and no starting with .). I believe columns are similar. HBaseStorage constructor syntax is error prone -- Key: PIG-1946 URL: https://issues.apache.org/jira/browse/PIG-1946 Project: Pig Issue Type: Improvement Reporter: Bill Graham Assignee: Bill Graham Fix For: 0.10 Attachments: PIG-1946_1.patch Using {{HBaseStorage}} like so seems like a reasonable thing to do, but it will yield unexpected results: {code} STORE result INTO 'hbase://foo' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( 'info:first_name, info:last_name'); {code} The problem us that a column named {{info:first_name,}} will be created, with the trailing comma included. I've had numerous developers get tripped up on this issue since everywhere else in Pig variables are separated by commas, so I propose we fix it. I propose we trim leading/trailing commas from column names, but I'm open to other ideas. Also should we accept column names that are comman-delimited without spaces? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1946) HBaseStorage constructor syntax is error prone
[ https://issues.apache.org/jira/browse/PIG-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13031006#comment-13031006 ] Dmitriy V. Ryaboy commented on PIG-1946: Bill, sorry it took me a while to get to this. Looks good, but I just confirmed that commas are valid in column names.. we should add escaping. HBaseStorage constructor syntax is error prone -- Key: PIG-1946 URL: https://issues.apache.org/jira/browse/PIG-1946 Project: Pig Issue Type: Improvement Reporter: Bill Graham Assignee: Bill Graham Fix For: 0.10 Attachments: PIG-1946_1.patch Using {{HBaseStorage}} like so seems like a reasonable thing to do, but it will yield unexpected results: {code} STORE result INTO 'hbase://foo' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( 'info:first_name, info:last_name'); {code} The problem us that a column named {{info:first_name,}} will be created, with the trailing comma included. I've had numerous developers get tripped up on this issue since everywhere else in Pig variables are separated by commas, so I propose we fix it. I propose we trim leading/trailing commas from column names, but I'm open to other ideas. Also should we accept column names that are comman-delimited without spaces? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1946) HBaseStorage constructor syntax is error prone
[ https://issues.apache.org/jira/browse/PIG-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013779#comment-13013779 ] Eric Yang commented on PIG-1946: An alternative is to modify the syntax like: {noformat} STORE result INTO 'hbase://foo' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( 'info:first_name info:last_name'); {noformat} Eliminate comma from the syntax completely. It may have some readability issue with this approach. Having that said, the problem can be solved in more user friendly manner by improving the parser to filter prefix and suffix of comma, and comma only cases. Maybe this issue can be defined more accurately with: Improve syntax parsing for HBaseStorage constructor. HBaseStorage constructor syntax is error prone -- Key: PIG-1946 URL: https://issues.apache.org/jira/browse/PIG-1946 Project: Pig Issue Type: Improvement Reporter: Bill Graham Assignee: Bill Graham Using {{HBaseStorage}} like so seems like a reasonable thing to do, but it will yield unexpected results: {code} STORE result INTO 'hbase://foo' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( 'info:first_name, info:last_name'); {code} The problem us that a column named {{info:first_name,}} will be created, with the trailing comma included. I've had numerous developers get tripped up on this issue since everywhere else in Pig variables are separated by commas, so I propose we fix it. I propose we trim leading/trailing commas from column names, but I'm open to other ideas. Also should we accept column names that are comman-delimited without spaces? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira