[jira] [Commented] (PIG-3261) User set PIG_CLASSPATH entries must be prepended to the CLASSPATH, not appended

2013-03-25 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13613554#comment-13613554
 ] 

Prashant Kommireddi commented on PIG-3261:
--

Thanks [~qwertymaniac]. Do you think user set classpath should always be added 
at the beginning? Or would it make sense to have a property similar to 
HADOOP_USER_CLASSPATH_FIRST ?

> User set PIG_CLASSPATH entries must be prepended to the CLASSPATH, not 
> appended
> ---
>
> Key: PIG-3261
> URL: https://issues.apache.org/jira/browse/PIG-3261
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.10.0
>Reporter: Harsh J
>Assignee: Harsh J
> Attachments: PIG-3261.patch
>
>
> Currently we are doing this wrong:
> {code}
> if [ "$PIG_CLASSPATH" != "" ]; then
> CLASSPATH=${CLASSPATH}:${PIG_CLASSPATH}
> {code}
> This means that anything added to CLASSPATH until that point will never be 
> able to get overridden by a user set environment, which is wrong behavior. 
> Hadoop libs for example are added to CLASSPATH, before this extension is 
> called in bin/pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3259) Optimize byte to Long/Integer conversions

2013-03-25 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13613528#comment-13613528
 ] 

Prashant Kommireddi commented on PIG-3259:
--


{quote} The check you have here does not accept all valid double string 
representations {quote} - thanks for noticing that. 

{quote} One way to avoid performance degradation for 'correct' case would be to 
start by doing .valueOf() without checks, then use the number of non-numbers 
encountered to decide if want to be making the sanityCheckIntegerLongDecimal() 
calls {quote} - I am not clear on the advantage here. How do we determine the 
number of non-numbers without making calls to sanityCheck..()? 

> Optimize byte to Long/Integer conversions
> -
>
> Key: PIG-3259
> URL: https://issues.apache.org/jira/browse/PIG-3259
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11, 0.11.1
>Reporter: Prashant Kommireddi
>Assignee: Prashant Kommireddi
> Fix For: 0.12
>
> Attachments: byteToLong.xlsx
>
>
> These conversions can be performing better. If the input is not numeric 
> (1234abcd) the code calls Double.valueOf(String) regardless before finally 
> returning null. Any script that inadvertently (user's mistake or not) tries 
> to cast non-numeric column to int or long would result in many wasteful 
> calls. 
> We can avoid this and only handle the cases we find the input to be a decimal 
> number (1234.56) and return null otherwise even before trying 
> Double.valueOf(String).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2470) Issue with CSVEXcelStorage piggy bank function

2013-03-25 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park resolved PIG-2470.


   Resolution: Fixed
Fix Version/s: 0.12
 Assignee: Jonathan Packer  (was: Prashant Kommireddi)

Closing the jira since it's fixed as part of PIG-3141.

> Issue with CSVEXcelStorage piggy bank function
> --
>
> Key: PIG-2470
> URL: https://issues.apache.org/jira/browse/PIG-2470
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.9.0
>Reporter: Priya Karkele
>Assignee: Jonathan Packer
> Fix For: 0.12
>
> Attachments: PIG-2470_2.patch, PIG-2470.patch
>
>
> CSVExcelStorage piggy bank function skips the record, which has 1 or more  
> null column(s) in it. The record is not written to the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3141) Giving CSVExcelStorage an option to handle header rows

2013-03-25 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3141:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

+1. Committed to trunk. Thanks Jonathan P!

Note that I got rid of all the ^M's in the following files while committing 
them:
* 
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/CSVExcelStorage.java
* 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestCSVExcelStorage.java


> Giving CSVExcelStorage an option to handle header rows
> --
>
> Key: PIG-3141
> URL: https://issues.apache.org/jira/browse/PIG-3141
> Project: Pig
>  Issue Type: Improvement
>  Components: piggybank
>Affects Versions: 0.11
>Reporter: Jonathan Packer
>Assignee: Jonathan Packer
> Fix For: 0.12
>
> Attachments: csv.patch, csv_updated.patch, PIG-3141_update_3.diff, 
> PIG-3141_update_4.diff
>
>
> Adds an argument to CSVExcelStorage to skip the header row when loading. This 
> works properly with multiple small files each with a header being combined 
> into one split, or a large file with a single header being split into 
> multiple splits.
> Also fixes a few bugs with CSVExcelStorage, including PIG-2470 and a bug 
> involving quoted fields at the end of a line not escaping properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: PIG-3141 [piggybank] Giving CSVExcelStorage an option to handle header rows

2013-03-25 Thread Cheolsoo Park

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9697/#review18379
---

Ship it!


Ship It!

- Cheolsoo Park


On March 25, 2013, 3:17 p.m., Jonathan Packer wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9697/
> ---
> 
> (Updated March 25, 2013, 3:17 p.m.)
> 
> 
> Review request for pig.
> 
> 
> Description
> ---
> 
> Reviewboard for https://issues.apache.org/jira/browse/PIG-3141
> 
> Adds a "header treatment" option to CSVExcelStorage allowing header rows 
> (first row with column names) in files to be skipped when loading, or for a 
> header row with column names to be written when storing. Should be backwards 
> compatible--all unit-tests from the old CSVExcelStorage pass.
> 
> 
> Diffs
> -
> 
>   
> contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/CSVExcelStorage.java
>  568b3f3 
>   
> contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestCSVExcelStorage.java
>  9bed527 
> 
> Diff: https://reviews.apache.org/r/9697/diff/
> 
> 
> Testing
> ---
> 
> cd contrib/piggybank/java
> ant -Dtestcase=TestCSVExcelStorage test
> 
> 
> Thanks,
> 
> Jonathan Packer
> 
>



[jira] [Updated] (PIG-3261) User set PIG_CLASSPATH entries must be prepended to the CLASSPATH, not appended

2013-03-25 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated PIG-3261:
-

Status: Patch Available  (was: Open)

> User set PIG_CLASSPATH entries must be prepended to the CLASSPATH, not 
> appended
> ---
>
> Key: PIG-3261
> URL: https://issues.apache.org/jira/browse/PIG-3261
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.10.0
>Reporter: Harsh J
>Assignee: Harsh J
> Attachments: PIG-3261.patch
>
>
> Currently we are doing this wrong:
> {code}
> if [ "$PIG_CLASSPATH" != "" ]; then
> CLASSPATH=${CLASSPATH}:${PIG_CLASSPATH}
> {code}
> This means that anything added to CLASSPATH until that point will never be 
> able to get overridden by a user set environment, which is wrong behavior. 
> Hadoop libs for example are added to CLASSPATH, before this extension is 
> called in bin/pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3261) User set PIG_CLASSPATH entries must be prepended to the CLASSPATH, not appended

2013-03-25 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated PIG-3261:
-

Attachment: PIG-3261.patch

> User set PIG_CLASSPATH entries must be prepended to the CLASSPATH, not 
> appended
> ---
>
> Key: PIG-3261
> URL: https://issues.apache.org/jira/browse/PIG-3261
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Affects Versions: 0.10.0
>Reporter: Harsh J
>Assignee: Harsh J
> Attachments: PIG-3261.patch
>
>
> Currently we are doing this wrong:
> {code}
> if [ "$PIG_CLASSPATH" != "" ]; then
> CLASSPATH=${CLASSPATH}:${PIG_CLASSPATH}
> {code}
> This means that anything added to CLASSPATH until that point will never be 
> able to get overridden by a user set environment, which is wrong behavior. 
> Hadoop libs for example are added to CLASSPATH, before this extension is 
> called in bin/pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3261) User set PIG_CLASSPATH entries must be prepended to the CLASSPATH, not appended

2013-03-25 Thread Harsh J (JIRA)
Harsh J created PIG-3261:


 Summary: User set PIG_CLASSPATH entries must be prepended to the 
CLASSPATH, not appended
 Key: PIG-3261
 URL: https://issues.apache.org/jira/browse/PIG-3261
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.10.0
Reporter: Harsh J
Assignee: Harsh J
 Attachments: PIG-3261.patch

Currently we are doing this wrong:

{code}
if [ "$PIG_CLASSPATH" != "" ]; then
CLASSPATH=${CLASSPATH}:${PIG_CLASSPATH}
{code}

This means that anything added to CLASSPATH until that point will never be able 
to get overridden by a user set environment, which is wrong behavior. Hadoop 
libs for example are added to CLASSPATH, before this extension is called in 
bin/pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Subscription: PIG patch available

2013-03-25 Thread jira
Issue Subscription
Filter: PIG patch available (34 issues)

Subscriber: pigdaily

Key Summary
PIG-3257Add unique identifier UDF
https://issues.apache.org/jira/browse/PIG-3257
PIG-3247Piggybank functions to mimic OVER clause in SQL
https://issues.apache.org/jira/browse/PIG-3247
PIG-3238Pig current releases lack a UDF Stuff(). This UDF deletes a 
specified length of characters and inserts another set of characters at a 
specified starting point.
https://issues.apache.org/jira/browse/PIG-3238
PIG-3237Pig current releases lack a UDF MakeSet(). This UDF returns a set 
value (a string containing substrings separated by "," characters) consisting 
of the strings that have the corresponding bit in the first argument
https://issues.apache.org/jira/browse/PIG-3237
PIG-3223AvroStorage does not handle comma separated input paths
https://issues.apache.org/jira/browse/PIG-3223
PIG-3215[piggybank] Add LTSVLoader to load LTSV (Labeled Tab-separated 
Values) files
https://issues.apache.org/jira/browse/PIG-3215
PIG-3210Pig fails to start when it cannot write log to log files
https://issues.apache.org/jira/browse/PIG-3210
PIG-3198Let users use any function from PigType -> PigType as if it were 
builtlin
https://issues.apache.org/jira/browse/PIG-3198
PIG-3193Fix "ant docs" warnings
https://issues.apache.org/jira/browse/PIG-3193
PIG-3190Add LuceneTokenizer and SnowballTokenizer to Pig - useful text 
tokenization
https://issues.apache.org/jira/browse/PIG-3190
PIG-3183rm or rmf commands should respect globbing/regex of path
https://issues.apache.org/jira/browse/PIG-3183
PIG-3173Partition filter push down does not happen partition keys condition 
include a AND and OR construct
https://issues.apache.org/jira/browse/PIG-3173
PIG-3166Update eclipse .classpath according to ivy library.properties
https://issues.apache.org/jira/browse/PIG-3166
PIG-3164Pig current releases lack a UDF endsWith.This UDF tests if a given 
string ends with the specified suffix.
https://issues.apache.org/jira/browse/PIG-3164
PIG-3141Giving CSVExcelStorage an option to handle header rows
https://issues.apache.org/jira/browse/PIG-3141
PIG-3123Simplify Logical Plans By Removing Unneccessary Identity Projections
https://issues.apache.org/jira/browse/PIG-3123
PIG-3122Operators should not implicitly become reserved keywords
https://issues.apache.org/jira/browse/PIG-3122
PIG-3114Duplicated macro name error when using pigunit
https://issues.apache.org/jira/browse/PIG-3114
PIG-3105Fix TestJobSubmission unit test failure.
https://issues.apache.org/jira/browse/PIG-3105
PIG-3088Add a builtin udf which removes prefixes
https://issues.apache.org/jira/browse/PIG-3088
PIG-3069Native Windows Compatibility for Pig E2E Tests and Harness
https://issues.apache.org/jira/browse/PIG-3069
PIG-3028testGrunt dev test needs some command filters to run correctly 
without cygwin
https://issues.apache.org/jira/browse/PIG-3028
PIG-3027pigTest unit test needs a newline filter for comparisons of golden 
multi-line
https://issues.apache.org/jira/browse/PIG-3027
PIG-3026Pig checked-in baseline comparisons need a pre-filter to address 
OS-specific newline differences
https://issues.apache.org/jira/browse/PIG-3026
PIG-3024TestEmptyInputDir unit test - hadoop version detection logic is 
brittle
https://issues.apache.org/jira/browse/PIG-3024
PIG-3015Rewrite of AvroStorage
https://issues.apache.org/jira/browse/PIG-3015
PIG-3010Allow UDF's to flatten themselves
https://issues.apache.org/jira/browse/PIG-3010
PIG-2959Add a pig.cmd for Pig to run under Windows
https://issues.apache.org/jira/browse/PIG-2959
PIG-2955 Fix bunch of Pig e2e tests on Windows 
https://issues.apache.org/jira/browse/PIG-2955
PIG-2873Converting bin/pig shell script to python
https://issues.apache.org/jira/browse/PIG-2873
PIG-2643Use bytecode generation to make a performance replacement for 
InvokeForLong, InvokeForString, etc
https://issues.apache.org/jira/browse/PIG-2643
PIG-2641Create toJSON function for all complex types: tuples, bags and maps
https://issues.apache.org/jira/browse/PIG-2641
PIG-2591Unit tests should not write to /tmp but respect java.io.tmpdir
https://issues.apache.org/jira/browse/PIG-2591
PIG-1914Support load/store JSON data in Pig
https://issues.apache.org/jira/browse/PIG-1914

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384


[jira] [Commented] (PIG-3259) Optimize byte to Long/Integer conversions

2013-03-25 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13613354#comment-13613354
 ] 

Thejas M Nair commented on PIG-3259:


Sounds like a good idea. 
The check you have here does not accept all valid double string representations 
(See 
http://docs.oracle.com/javase/6/docs/api/java/lang/Double.html#valueOf(java.lang.String)
 ) . (eg with exponent, or hexadecimal representation starting with 0x).

But if we can avoid the performance degradation for the 'correct' [1] case 
(which seems to be be in range of 2-8% in the micro benchmark that ran for at 
least few seconds), that would be better. One way to avoid performance 
degradation for 'correct' case would be to start by doing .valueOf() without 
checks, then use the number of non-numbers encountered to decide if want to be 
making the sanityCheckIntegerLongDecimal() calls.

[1]  - by correct I mean the case where the field declared an integer or a 
double has correct representation.

> Optimize byte to Long/Integer conversions
> -
>
> Key: PIG-3259
> URL: https://issues.apache.org/jira/browse/PIG-3259
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11, 0.11.1
>Reporter: Prashant Kommireddi
>Assignee: Prashant Kommireddi
> Fix For: 0.12
>
> Attachments: byteToLong.xlsx
>
>
> These conversions can be performing better. If the input is not numeric 
> (1234abcd) the code calls Double.valueOf(String) regardless before finally 
> returning null. Any script that inadvertently (user's mistake or not) tries 
> to cast non-numeric column to int or long would result in many wasteful 
> calls. 
> We can avoid this and only handle the cases we find the input to be a decimal 
> number (1234.56) and return null otherwise even before trying 
> Double.valueOf(String).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [VOTE] Release Pig 0.11.1 (candidate 0)

2013-03-25 Thread Daniel Dai
Yes, it is Ok with me.

Daniel

On Mon, Mar 25, 2013 at 2:44 PM, Julien Le Dem  wrote:
> +1
> The full test suite is passing.
> I don't think we need not make a new rc just for one license header missing.
> Daniel, is it OK for you ?
> Thanks,
> Julien
>
> On Mon, Mar 25, 2013 at 11:02 AM, Daniel Dai  wrote:
>> My fault for missing license header for
>> UDFContextTestLoaderWithSignature. Added it to both files, Thanks
>> Prashant!
>>
>> I run unit tests/e2e tests, both passed. +1 for the rc except for the
>> license header issue.
>>
>> Daniel
>>
>> On Sun, Mar 24, 2013 at 11:18 PM, Prashant Kommireddi
>>  wrote:
>>> Downloaded tarball and performed the following:
>>>
>>>1. ant releaseaudit - UDFContextTestLoaderWithSignature (
>>>http://svn.apache.org/viewvc?view=revision&revision=r1458036) and
>>>DOTParser.jjt do not have Apache License header.
>>>2. Verified RELEASE_NOTES.txt for correct version numbers
>>>3. Verified build.xml points to next version (0.11.2) SNAPSHOT
>>>4. Built and tested Piggybank, Built tutorial - looks good.
>>>5. Tested jar by running scripts against 0.20.2 hadoop cluster (would be
>>>great to have someone else test the same)
>>>6. ant test-commit - all tests pass
>>>
>>> Except for #1, RC looks good to me.
>>> Thanks,
>>> -Prashant
>>>
>>> On Fri, Mar 22, 2013 at 7:58 AM, Bill Graham  wrote:
>>>
 Hi,

 I have created a candidate build for Pig 0.11.1. This is a maintenance
 release
 of Pig 0.11.

 Keys used to sign the release are available at:
 http://svn.apache.org/viewvc/pig/trunk/KEYS?view=markup

 Please download, test, and try it out:
 http://people.apache.org/~billgraham/pig-0.11.1-candidate-0/

 Should we release this? Vote closes on next Thursday EOD, Mar 28th.

 Thanks,
 Bill



Re: [VOTE] Release Pig 0.11.1 (candidate 0)

2013-03-25 Thread Julien Le Dem
+1
The full test suite is passing.
I don't think we need not make a new rc just for one license header missing.
Daniel, is it OK for you ?
Thanks,
Julien

On Mon, Mar 25, 2013 at 11:02 AM, Daniel Dai  wrote:
> My fault for missing license header for
> UDFContextTestLoaderWithSignature. Added it to both files, Thanks
> Prashant!
>
> I run unit tests/e2e tests, both passed. +1 for the rc except for the
> license header issue.
>
> Daniel
>
> On Sun, Mar 24, 2013 at 11:18 PM, Prashant Kommireddi
>  wrote:
>> Downloaded tarball and performed the following:
>>
>>1. ant releaseaudit - UDFContextTestLoaderWithSignature (
>>http://svn.apache.org/viewvc?view=revision&revision=r1458036) and
>>DOTParser.jjt do not have Apache License header.
>>2. Verified RELEASE_NOTES.txt for correct version numbers
>>3. Verified build.xml points to next version (0.11.2) SNAPSHOT
>>4. Built and tested Piggybank, Built tutorial - looks good.
>>5. Tested jar by running scripts against 0.20.2 hadoop cluster (would be
>>great to have someone else test the same)
>>6. ant test-commit - all tests pass
>>
>> Except for #1, RC looks good to me.
>> Thanks,
>> -Prashant
>>
>> On Fri, Mar 22, 2013 at 7:58 AM, Bill Graham  wrote:
>>
>>> Hi,
>>>
>>> I have created a candidate build for Pig 0.11.1. This is a maintenance
>>> release
>>> of Pig 0.11.
>>>
>>> Keys used to sign the release are available at:
>>> http://svn.apache.org/viewvc/pig/trunk/KEYS?view=markup
>>>
>>> Please download, test, and try it out:
>>> http://people.apache.org/~billgraham/pig-0.11.1-candidate-0/
>>>
>>> Should we release this? Vote closes on next Thursday EOD, Mar 28th.
>>>
>>> Thanks,
>>> Bill
>>>


[jira] [Resolved] (PIG-3260) Number of languages which support UDF is listed 3 instead of 5

2013-03-25 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-3260.
-

   Resolution: Fixed
Fix Version/s: 0.11.1
 Assignee: Daniel Dai

Fixed in 0.11 branch. trunk is already fixed. Thanks for reporting!

> Number of languages which support UDF is listed 3 instead of 5
> --
>
> Key: PIG-3260
> URL: https://issues.apache.org/jira/browse/PIG-3260
> Project: Pig
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.11
>Reporter: Sumod Pawgi
>Assignee: Daniel Dai
>Priority: Trivial
> Fix For: 0.11.1
>
>
> On the Pig UDF page - http://pig.apache.org/docs/r0.11.0/udf.html#udfs, it 
> says that - "Pig UDFs can currently be implemented in three languages: Java, 
> Python, JavaScript, Ruby and Groovy." However, these are 5 languages. Very 
> minor probably typing mistake. But thought of reporting it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (PIG-3259) Optimize byte to Long/Integer conversions

2013-03-25 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi reassigned PIG-3259:


Assignee: Prashant Kommireddi

> Optimize byte to Long/Integer conversions
> -
>
> Key: PIG-3259
> URL: https://issues.apache.org/jira/browse/PIG-3259
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11, 0.11.1
>Reporter: Prashant Kommireddi
>Assignee: Prashant Kommireddi
> Fix For: 0.12
>
> Attachments: byteToLong.xlsx
>
>
> These conversions can be performing better. If the input is not numeric 
> (1234abcd) the code calls Double.valueOf(String) regardless before finally 
> returning null. Any script that inadvertently (user's mistake or not) tries 
> to cast non-numeric column to int or long would result in many wasteful 
> calls. 
> We can avoid this and only handle the cases we find the input to be a decimal 
> number (1234.56) and return null otherwise even before trying 
> Double.valueOf(String).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [VOTE] Release Pig 0.11.1 (candidate 0)

2013-03-25 Thread Daniel Dai
My fault for missing license header for
UDFContextTestLoaderWithSignature. Added it to both files, Thanks
Prashant!

I run unit tests/e2e tests, both passed. +1 for the rc except for the
license header issue.

Daniel

On Sun, Mar 24, 2013 at 11:18 PM, Prashant Kommireddi
 wrote:
> Downloaded tarball and performed the following:
>
>1. ant releaseaudit - UDFContextTestLoaderWithSignature (
>http://svn.apache.org/viewvc?view=revision&revision=r1458036) and
>DOTParser.jjt do not have Apache License header.
>2. Verified RELEASE_NOTES.txt for correct version numbers
>3. Verified build.xml points to next version (0.11.2) SNAPSHOT
>4. Built and tested Piggybank, Built tutorial - looks good.
>5. Tested jar by running scripts against 0.20.2 hadoop cluster (would be
>great to have someone else test the same)
>6. ant test-commit - all tests pass
>
> Except for #1, RC looks good to me.
> Thanks,
> -Prashant
>
> On Fri, Mar 22, 2013 at 7:58 AM, Bill Graham  wrote:
>
>> Hi,
>>
>> I have created a candidate build for Pig 0.11.1. This is a maintenance
>> release
>> of Pig 0.11.
>>
>> Keys used to sign the release are available at:
>> http://svn.apache.org/viewvc/pig/trunk/KEYS?view=markup
>>
>> Please download, test, and try it out:
>> http://people.apache.org/~billgraham/pig-0.11.1-candidate-0/
>>
>> Should we release this? Vote closes on next Thursday EOD, Mar 28th.
>>
>> Thanks,
>> Bill
>>


[jira] [Updated] (PIG-3141) Giving CSVExcelStorage an option to handle header rows

2013-03-25 Thread Jonathan Packer (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Packer updated PIG-3141:
-

Description: 
Adds an argument to CSVExcelStorage to skip the header row when loading. This 
works properly with multiple small files each with a header being combined into 
one split, or a large file with a single header being split into multiple 
splits.

Also fixes a few bugs with CSVExcelStorage, including PIG-2470 and a bug 
involving quoted fields at the end of a line not escaping properly.

  was:
Adds an argument to CSVExcelStorage to skip the header row when loading. This 
works properly with multiple small files each with a header being combined into 
one split, or a large file with a single header being split into multiple 
splits.

Also fixes a few bugs with CSVExcelStorage, including PIG-2470 and a bug 
involving quoted fields at the end of a line not escaping properly.

Removes the choice of delimiter, since a CSV file ought to only use a comma 
delimiter, hence the name.


> Giving CSVExcelStorage an option to handle header rows
> --
>
> Key: PIG-3141
> URL: https://issues.apache.org/jira/browse/PIG-3141
> Project: Pig
>  Issue Type: Improvement
>  Components: piggybank
>Affects Versions: 0.11
>Reporter: Jonathan Packer
>Assignee: Jonathan Packer
> Fix For: 0.12
>
> Attachments: csv.patch, csv_updated.patch, PIG-3141_update_3.diff, 
> PIG-3141_update_4.diff
>
>
> Adds an argument to CSVExcelStorage to skip the header row when loading. This 
> works properly with multiple small files each with a header being combined 
> into one split, or a large file with a single header being split into 
> multiple splits.
> Also fixes a few bugs with CSVExcelStorage, including PIG-2470 and a bug 
> involving quoted fields at the end of a line not escaping properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3141) Giving CSVExcelStorage an option to handle header rows

2013-03-25 Thread Jonathan Packer (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Packer updated PIG-3141:
-

Attachment: PIG-3141_update_4.diff

Updated diff with code review changes (also updated on ReviewBoard). Thanks for 
taking a look at this and the fixed-width patch Cheolsoo.

> Giving CSVExcelStorage an option to handle header rows
> --
>
> Key: PIG-3141
> URL: https://issues.apache.org/jira/browse/PIG-3141
> Project: Pig
>  Issue Type: Improvement
>  Components: piggybank
>Affects Versions: 0.11
>Reporter: Jonathan Packer
>Assignee: Jonathan Packer
> Fix For: 0.12
>
> Attachments: csv.patch, csv_updated.patch, PIG-3141_update_3.diff, 
> PIG-3141_update_4.diff
>
>
> Adds an argument to CSVExcelStorage to skip the header row when loading. This 
> works properly with multiple small files each with a header being combined 
> into one split, or a large file with a single header being split into 
> multiple splits.
> Also fixes a few bugs with CSVExcelStorage, including PIG-2470 and a bug 
> involving quoted fields at the end of a line not escaping properly.
> Removes the choice of delimiter, since a CSV file ought to only use a comma 
> delimiter, hence the name.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: PIG-3141 [piggybank] Giving CSVExcelStorage an option to handle header rows

2013-03-25 Thread Jonathan Packer

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9697/
---

(Updated March 25, 2013, 3:17 p.m.)


Review request for pig.


Changes
---

Code review changes for Cheolsoo


Description
---

Reviewboard for https://issues.apache.org/jira/browse/PIG-3141

Adds a "header treatment" option to CSVExcelStorage allowing header rows (first 
row with column names) in files to be skipped when loading, or for a header row 
with column names to be written when storing. Should be backwards 
compatible--all unit-tests from the old CSVExcelStorage pass.


Diffs (updated)
-

  
contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/CSVExcelStorage.java
 568b3f3 
  
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/TestCSVExcelStorage.java
 9bed527 

Diff: https://reviews.apache.org/r/9697/diff/


Testing
---

cd contrib/piggybank/java
ant -Dtestcase=TestCSVExcelStorage test


Thanks,

Jonathan Packer