[jira] [Commented] (PIG-1832) Support timestamp in HBaseStorage when storing

2013-02-26 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588059#comment-13588059
 ] 

Eric Yang commented on PIG-1832:


hi Guido, I think the -tmpestamp= make sense for high throughput 
system.  We probably should revisit per cell level timestamp writing later.  
This is not a high priority item for me to work on.  If anyone would like to 
tackle this issue, feel free to take this issue.

> Support timestamp in HBaseStorage when storing
> --
>
> Key: PIG-1832
> URL: https://issues.apache.org/jira/browse/PIG-1832
> Project: Pig
>  Issue Type: Improvement
>Reporter: Eric Yang
>
> When storing data into HBase using 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage, HBase timestamp field is 
> stored with insertion time of the mapreduce job.  It would be nice to have a 
> way to populate timestamp from user data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3222) New UDFContextSignature assignments in Pig 0.11 breaks HCatalog.HCatStorer

2013-02-26 Thread Bill Graham (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588037#comment-13588037
 ] 

Bill Graham commented on PIG-3222:
--

Feng could you attach a sample test script/storer that reproduces the Pig bug 
without HCatalog?

> New UDFContextSignature assignments in Pig 0.11 breaks HCatalog.HCatStorer 
> ---
>
> Key: PIG-3222
> URL: https://issues.apache.org/jira/browse/PIG-3222
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.11
>Reporter: Feng Peng
>  Labels: hcatalog
>
> Pig 0.11 assigns different UDFContextSignature for different invocations of 
> the same load/store statement. This change breaks the HCatStorer which 
> assumes all front-end and back-end invocations of the same store statement 
> has the same UDFContextSignature so that it can read the previously stored 
> information correctly.
> The related HCatalog code is in 
> https://svn.apache.org/repos/asf/incubator/hcatalog/branches/branch-0.5/hcatalog-pig-adapter/src/main/java/org/apache/hcatalog/pig/HCatStorer.java
>  (the setStoreLocation() function).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3183) rm or rmf commands should respect globbing/regex of path

2013-02-26 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13587865#comment-13587865
 ] 

Prashant Kommireddi commented on PIG-3183:
--

[~jcoveney] or others have any comments?

> rm or rmf commands should respect globbing/regex of path
> 
>
> Key: PIG-3183
> URL: https://issues.apache.org/jira/browse/PIG-3183
> Project: Pig
>  Issue Type: Improvement
>  Components: grunt
>Affects Versions: 0.10.0
>Reporter: Prashant Kommireddi
>Assignee: Prashant Kommireddi
> Fix For: 0.12
>
> Attachments: PIG-3183.patch
>
>
> Hadoop fs commands support globbing during deleting files/dirs. Pig is not 
> consistent with this behavior and seems like we could change rm/rmf commands 
> to do the same.
> For eg:
> {code}
> localhost:pig pkommireddi$ ls -ld out*
> drwxr-xr-x  12 pkommireddi  SF\domain users  408 Feb 13 01:09 out
> drwxr-xr-x   2 pkommireddi  SF\domain users   68 Feb 13 01:16 out1
> drwxr-xr-x   2 pkommireddi  SF\domain users   68 Feb 13 01:16 out2
> localhost:pig pkommireddi$ bin/pig -x local
> grunt> rmf out*
> grunt> quit
> localhost:pig pkommireddi$ ls -ld out*
> drwxr-xr-x  12 pkommireddi  SF\domain users  408 Feb 13 01:09 out
> drwxr-xr-x   2 pkommireddi  SF\domain users   68 Feb 13 01:16 out1
> drwxr-xr-x   2 pkommireddi  SF\domain users   68 Feb 13 01:16 out2
> {code}
> Ideally, the user would expect "rmf out*" to delete all of the above dirs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Subscription: PIG patch available

2013-02-26 Thread jira
Issue Subscription
Filter: PIG patch available (33 issues)

Subscriber: pigdaily

Key Summary
PIG-3216Groovy UDFs documentation has minor typos
https://issues.apache.org/jira/browse/PIG-3216
PIG-3215[piggybank] Add LTSVLoader to load LTSV (Labeled Tab-separated 
Values) files
https://issues.apache.org/jira/browse/PIG-3215
PIG-3210Pig fails to start when it cannot write log to log files
https://issues.apache.org/jira/browse/PIG-3210
PIG-3205Passing arguments to python script does not work with -f option
https://issues.apache.org/jira/browse/PIG-3205
PIG-3198Let users use any function from PigType -> PigType as if it were 
builtlin
https://issues.apache.org/jira/browse/PIG-3198
PIG-3185Pig release lacks UDF for Initcap function
https://issues.apache.org/jira/browse/PIG-3185
PIG-3184Pig release lacks UDF for functions rtrim and repeat
https://issues.apache.org/jira/browse/PIG-3184
PIG-3183rm or rmf commands should respect globbing/regex of path
https://issues.apache.org/jira/browse/PIG-3183
PIG-3166Update eclipse .classpath according to ivy library.properties
https://issues.apache.org/jira/browse/PIG-3166
PIG-3164Pig current releases lack a UDF endsWith.This UDF tests if a given 
string ends with the specified suffix.
https://issues.apache.org/jira/browse/PIG-3164
PIG-3162PigTest.assertOutput doesn't allow non-default delimiter
https://issues.apache.org/jira/browse/PIG-3162
PIG-3144Erroneous map entry alias resolution leading to "Duplicate schema 
alias" errors
https://issues.apache.org/jira/browse/PIG-3144
PIG-3142Fixed-width load and store functions for the Piggybank
https://issues.apache.org/jira/browse/PIG-3142
PIG-3136Introduce a syntax making declared aliases optional
https://issues.apache.org/jira/browse/PIG-3136
PIG-3123Simplify Logical Plans By Removing Unneccessary Identity Projections
https://issues.apache.org/jira/browse/PIG-3123
PIG-3122Operators should not implicitly become reserved keywords
https://issues.apache.org/jira/browse/PIG-3122
PIG-3114Duplicated macro name error when using pigunit
https://issues.apache.org/jira/browse/PIG-3114
PIG-3105Fix TestJobSubmission unit test failure.
https://issues.apache.org/jira/browse/PIG-3105
PIG-3088Add a builtin udf which removes prefixes
https://issues.apache.org/jira/browse/PIG-3088
PIG-3081Pig progress stays at 0% for the first job in hadoop 23
https://issues.apache.org/jira/browse/PIG-3081
PIG-3069Native Windows Compatibility for Pig E2E Tests and Harness
https://issues.apache.org/jira/browse/PIG-3069
PIG-3028testGrunt dev test needs some command filters to run correctly 
without cygwin
https://issues.apache.org/jira/browse/PIG-3028
PIG-3027pigTest unit test needs a newline filter for comparisons of golden 
multi-line
https://issues.apache.org/jira/browse/PIG-3027
PIG-3026Pig checked-in baseline comparisons need a pre-filter to address 
OS-specific newline differences
https://issues.apache.org/jira/browse/PIG-3026
PIG-3024TestEmptyInputDir unit test - hadoop version detection logic is 
brittle
https://issues.apache.org/jira/browse/PIG-3024
PIG-3015Rewrite of AvroStorage
https://issues.apache.org/jira/browse/PIG-3015
PIG-3010Allow UDF's to flatten themselves
https://issues.apache.org/jira/browse/PIG-3010
PIG-2959Add a pig.cmd for Pig to run under Windows
https://issues.apache.org/jira/browse/PIG-2959
PIG-2955 Fix bunch of Pig e2e tests on Windows 
https://issues.apache.org/jira/browse/PIG-2955
PIG-2643Use bytecode generation to make a performance replacement for 
InvokeForLong, InvokeForString, etc
https://issues.apache.org/jira/browse/PIG-2643
PIG-2641Create toJSON function for all complex types: tuples, bags and maps
https://issues.apache.org/jira/browse/PIG-2641
PIG-2591Unit tests should not write to /tmp but respect java.io.tmpdir
https://issues.apache.org/jira/browse/PIG-2591
PIG-1914Support load/store JSON data in Pig
https://issues.apache.org/jira/browse/PIG-1914

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384


[jira] [Commented] (PIG-3199) Expose LogicalPlan via PigServer API

2013-02-26 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13587510#comment-13587510
 ] 

Prashant Kommireddi commented on PIG-3199:
--

Mainly because we don't need the extra steps of running the optimizer, 
generating Physical plan, generating MR plan to get to this information. It 
just feels querying LP for source/sink or load/store funcs is more efficient. 
Would be happy to get your thoughts?

> Expose LogicalPlan via PigServer API
> 
>
> Key: PIG-3199
> URL: https://issues.apache.org/jira/browse/PIG-3199
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.10.0
>Reporter: Prashant Kommireddi
>Assignee: Prashant Kommireddi
> Fix For: 0.12
>
> Attachments: PIG-3199.patch
>
>
> LogicalPlan could be exposed to user in order for one to make validations 
> based on it. For eg, one could get Load/Store paths or other operators and be 
> able to perform checks such as whether I/O paths are valid etc.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3212) Race Conditions in POSort and (Internal)SortedBag during Proactive Spill.

2013-02-26 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13587361#comment-13587361
 ] 

Dmitriy V. Ryaboy commented on PIG-3212:


Ok so I think the fix works, but it seems like this bug has been there for a 
long time -- the synchronization issue with SMM would've been there before 
0.11, as well. Is it just more visible now because SMM is faster (fewer bags to 
go through)? 

It seems unlikely that we ever get to spill an internal sorted bag, given this 
patch.. seems like it almost always has an iterator open.  If the concern is 
using the same comparator -- could we not solve this by initializing a new 
comparator for every bag?

> Race Conditions in POSort and (Internal)SortedBag during Proactive Spill.
> -
>
> Key: PIG-3212
> URL: https://issues.apache.org/jira/browse/PIG-3212
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11
>Reporter: Kai Londenberg
>Priority: Critical
> Fix For: 0.12, 0.11.1
>
> Attachments: PIG-3212-p1.patch
>
>
> The following bug exists in the latest release of Pig 0.11.0
> While running some large jobs involving groups and sorts like these:
> {code}
> events_by_user = GROUP events BY user_id;
> sorted_events_by_user = FOREACH events_by_user {
>   A = ORDER events BY ts, split_idx, line_num;
>   GENERATE group, A;
> }
> {code}
> I got a pretty strange behaviour: While this worked on small datasets, if I 
> ran it on large datasets, the results were sometimes not sorted perfectly. 
> So after a long debugging session, I tracked it down to at least one race 
> condition:
> The following partial stack trace shows how a proactive spill gets triggered 
> on an InternalSortedBag. A spill in turn triggers a sort of that 
> InternalSortedBag.
> {code}
>   at 
> org.apache.pig.data.SortedSpillBag.proactive_spill(SortedSpillBag.java:83)
>   at 
> org.apache.pig.data.InternalSortedBag.spill(InternalSortedBag.java:455)
>   at 
> org.apache.pig.impl.util.SpillableMemoryManager.handleNotification(SpillableMemoryManager.java:243)
>   at 
> sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:138)
>   at sun.management.MemoryImpl.createNotification(MemoryImpl.java:171)
>   at 
> sun.management.MemoryPoolImpl$PoolSensor.triggerAction(MemoryPoolImpl.java:272)
>   at sun.management.Sensor.trigger(Sensor.java:120)
> {code}
> At the same time, the same InternalSortedBag might be sorted or accessed 
> within a POSort Operation. For example using the following Code path (line 
> numbers might be off, I had to add debug statements to diagnose this)
> {code}
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSort.getNext(POSort.java:346)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:492)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:582)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.getNext(PORelationToExprProject.java:107)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:394)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:372)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:297)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:368)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:214)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:465)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:433)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:413)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:257)
>   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
>   at org.apache.hadoop.mapred.Child.main(Child.java:170)
> {code}
> The key here is: Bot

[jira] [Commented] (PIG-3067) HBaseStorage should be split up to become more manageable

2013-02-26 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13587338#comment-13587338
 ] 

Dmitriy V. Ryaboy commented on PIG-3067:


Just chiming in to say thanks, I like where this is going as well.

> HBaseStorage should be split up to become more manageable
> -
>
> Key: PIG-3067
> URL: https://issues.apache.org/jira/browse/PIG-3067
> Project: Pig
>  Issue Type: Improvement
>Reporter: Christoph Bauer
>Assignee: Christoph Bauer
> Attachments: hbasestorage-split.patch
>
>
> HBaseStorage has become quite big (>1100 lines).
> I propose to split it up into more managable parts. I believe it will become 
> a lot easier to maintain.
> I split it up like this:
> HBaseStorage
> * settings:LoadStoreFuncSettings
> ** options
> ** caster
> ** udfProperties
> ** contextSignature
> ** columns:ColumnInfo - moved to its own class-file
> * loadFuncDelegate:HBaseLoadFunc - LoadFunc implementation
> ** settings:LoadStoreFuncSettings (s.a.)
> ** scanner:HBaseLoadFuncScanner - everything scan-specific
> ** tupleIterator:HBaseTupleIterator - interface for _public Tuple getNext()_
> * storeFuncDelegate:HBaseStorFunc - StorFunc implementation
> ** settings:LoadStoreFuncSettings (s.a.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3067) HBaseStorage should be split up to become more manageable

2013-02-26 Thread Bill Graham (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Graham updated PIG-3067:
-

Summary: HBaseStorage should be split up to become more manageable  (was: 
HBaseStorage should be split up to become more managable)

> HBaseStorage should be split up to become more manageable
> -
>
> Key: PIG-3067
> URL: https://issues.apache.org/jira/browse/PIG-3067
> Project: Pig
>  Issue Type: Improvement
>Reporter: Christoph Bauer
>Assignee: Christoph Bauer
> Attachments: hbasestorage-split.patch
>
>
> HBaseStorage has become quite big (>1100 lines).
> I propose to split it up into more managable parts. I believe it will become 
> a lot easier to maintain.
> I split it up like this:
> HBaseStorage
> * settings:LoadStoreFuncSettings
> ** options
> ** caster
> ** udfProperties
> ** contextSignature
> ** columns:ColumnInfo - moved to its own class-file
> * loadFuncDelegate:HBaseLoadFunc - LoadFunc implementation
> ** settings:LoadStoreFuncSettings (s.a.)
> ** scanner:HBaseLoadFuncScanner - everything scan-specific
> ** tupleIterator:HBaseTupleIterator - interface for _public Tuple getNext()_
> * storeFuncDelegate:HBaseStorFunc - StorFunc implementation
> ** settings:LoadStoreFuncSettings (s.a.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1832) Support timestamp in HBaseStorage when storing

2013-02-26 Thread Bill Graham (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13587328#comment-13587328
 ] 

Bill Graham commented on PIG-1832:
--

I don't think there is a ticket to support returning multiple cell versions 
with timestamps, but we did discuss ideas for an approach here:

https://issues.apache.org/jira/browse/PIG-1782?focusedCommentId=12988192&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12988192

Basically the idea is to create a new class to support this, since it would be 
fundamentally very different than what we currently support with 
{{HBaseStorage}}. That work might be better handled after we tackle PIG-3067 
(HBaseStorage should be split up to become more manageable).

> Support timestamp in HBaseStorage when storing
> --
>
> Key: PIG-1832
> URL: https://issues.apache.org/jira/browse/PIG-1832
> Project: Pig
>  Issue Type: Improvement
>Reporter: Eric Yang
>
> When storing data into HBase using 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage, HBase timestamp field is 
> stored with insertion time of the mapreduce job.  It would be nice to have a 
> way to populate timestamp from user data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1832) Support timestamp in HBaseStorage when storing

2013-02-26 Thread Guido Serra aka Zeph (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13587268#comment-13587268
 ] 

Guido Serra aka Zeph commented on PIG-1832:
---

s/imaging/imagine

> Support timestamp in HBaseStorage when storing
> --
>
> Key: PIG-1832
> URL: https://issues.apache.org/jira/browse/PIG-1832
> Project: Pig
>  Issue Type: Improvement
>Reporter: Eric Yang
>
> When storing data into HBase using 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage, HBase timestamp field is 
> stored with insertion time of the mapreduce job.  It would be nice to have a 
> way to populate timestamp from user data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1832) Support timestamp in HBaseStorage when storing

2013-02-26 Thread Guido Serra aka Zeph (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13587267#comment-13587267
 ] 

Guido Serra aka Zeph commented on PIG-1832:
---

p.s. [~billgraham] I can't find a ticket addressing the outputting the 
timestamp... I mean, imaging I'd like to see multiple versions, given a time 
range... (k, I guess I need to create a feature ticket for that)

> Support timestamp in HBaseStorage when storing
> --
>
> Key: PIG-1832
> URL: https://issues.apache.org/jira/browse/PIG-1832
> Project: Pig
>  Issue Type: Improvement
>Reporter: Eric Yang
>
> When storing data into HBase using 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage, HBase timestamp field is 
> stored with insertion time of the mapreduce job.  It would be nice to have a 
> way to populate timestamp from user data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3216) Groovy UDFs documentation has minor typos

2013-02-26 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3216:


Status: Patch Available  (was: Open)

> Groovy UDFs documentation has minor typos
> -
>
> Key: PIG-3216
> URL: https://issues.apache.org/jira/browse/PIG-3216
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 0.11
>Reporter: Mathias Herberts
>Assignee: Mathias Herberts
>Priority: Trivial
> Attachments: PIG-3216.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3002) Pig client should handle CountersExceededException

2013-02-26 Thread Jarek Jarcec Cecho (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13587246#comment-13587246
 ] 

Jarek Jarcec Cecho commented on PIG-3002:
-

Hi [~billgraham], 
thank you very much for taking a look on this Jira and my patch. I was 
considering similar solution as you proposed in my early work, but I've notice 
one side effect during experiments with my early patches.

I've created quite pathological case when my cluster was using default 
configuration, but I've limit the number of allowed counters to 3 on machine 
where I've executed pig. I've noticed that with similar fix, pig will print out 
couple of counters and than bail out on exception on first non existing 
Counter. As a result not all the counters will be printed out even though they 
are available in the {{Couter}} object.

My experiment is obviously not entirely real as it's unlikely that users will 
have different hadoop configuration. However I believe that it model the edge 
situation when mapreduce job will create almost all available counters, but 
because client is iterating over predefined set, not all of them will be 
printed out.

I've also did one step further and put the {{try-catch}} block inside the 
{{for}} iteration. I've noticed that in this situation we might print out the 
error message several times, which is kind of distracting. This lead me to the 
idea of doing the changes on the shim layer that I've submitted.

Jarcec

> Pig client should handle CountersExceededException
> --
>
> Key: PIG-3002
> URL: https://issues.apache.org/jira/browse/PIG-3002
> Project: Pig
>  Issue Type: Bug
>Reporter: Bill Graham
>Assignee: Jarek Jarcec Cecho
>  Labels: newbie, simple
> Attachments: PIG-3002.2.patch, PIG-3002.patch
>
>
> Running a pig job that uses more than 120 counters will succeed, but a grunt 
> exception will occur when trying to output counter info to the console. This 
> exception should be caught and handled with friendly messaging:
> {noformat}
> org.apache.pig.backend.executionengine.ExecException: ERROR 2043: Unexpected 
> error during execution.
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1275)
> at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1249)
> at org.apache.pig.PigServer.execute(PigServer.java:1239)
> at org.apache.pig.PigServer.executeBatch(PigServer.java:333)
> at 
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:136)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:197)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:169)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
> at org.apache.pig.Main.run(Main.java:604)
> at org.apache.pig.Main.main(Main.java:154)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> Caused by: org.apache.hadoop.mapred.Counters$CountersExceededException: 
> Error: Exceeded limits on number of counters - Counters=120 Limit=120
> at 
> org.apache.hadoop.mapred.Counters$Group.getCounterForName(Counters.java:312)
> at org.apache.hadoop.mapred.Counters.findCounter(Counters.java:431)
> at org.apache.hadoop.mapred.Counters.getCounter(Counters.java:495)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.computeWarningAggregate(MapReduceLauncher.java:707)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:442)
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1264)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1832) Support timestamp in HBaseStorage when storing

2013-02-26 Thread Bill Graham (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Graham updated PIG-1832:
-

Environment: (was: Java 6, Mac OS X 10.6)

> Support timestamp in HBaseStorage when storing
> --
>
> Key: PIG-1832
> URL: https://issues.apache.org/jira/browse/PIG-1832
> Project: Pig
>  Issue Type: Improvement
>Reporter: Eric Yang
>
> When storing data into HBase using 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage, HBase timestamp field is 
> stored with insertion time of the mapreduce job.  It would be nice to have a 
> way to populate timestamp from user data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1832) Support timestamp in HBaseStorage when storing

2013-02-26 Thread Bill Graham (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Graham updated PIG-1832:
-

Summary: Support timestamp in HBaseStorage when storing  (was: Support 
timestamp in HBaseStorage)

> Support timestamp in HBaseStorage when storing
> --
>
> Key: PIG-1832
> URL: https://issues.apache.org/jira/browse/PIG-1832
> Project: Pig
>  Issue Type: Improvement
> Environment: Java 6, Mac OS X 10.6
>Reporter: Eric Yang
>
> When storing data into HBase using 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage, HBase timestamp field is 
> stored with insertion time of the mapreduce job.  It would be nice to have a 
> way to populate timestamp from user data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1832) Support timestamp in HBaseStorage

2013-02-26 Thread Bill Graham (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13587239#comment-13587239
 ] 

Bill Graham commented on PIG-1832:
--

Yes, read via time ranges is done. Work on PIG-2114 seems stalled though and 
there's a lot going on in that patch. I propose this JIRA just add write 
support for -timestamp= for consistency with the 
current read API. That's a quick change that would be useful and would give 
full read/write support for timestamps. That would also help reduce the 
somewhat broad scope of PIG-2114.

> Support timestamp in HBaseStorage
> -
>
> Key: PIG-1832
> URL: https://issues.apache.org/jira/browse/PIG-1832
> Project: Pig
>  Issue Type: Improvement
> Environment: Java 6, Mac OS X 10.6
>Reporter: Eric Yang
>
> When storing data into HBase using 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage, HBase timestamp field is 
> stored with insertion time of the mapreduce job.  It would be nice to have a 
> way to populate timestamp from user data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1832) Support timestamp in HBaseStorage

2013-02-26 Thread Guido Serra aka Zeph (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13587189#comment-13587189
 ] 

Guido Serra aka Zeph commented on PIG-1832:
---

even... they just updated ( PIG-2341 ) the documentation:
 - http://pig.apache.org/docs/r0.11.0/func.html#HBaseStorage

I'd say, that just having the double usage of "-timestamp=", at LOAD and on 
STORE, is all we need

right now (as of version 0.11), this option is being taken into consideration 
only at LOAD time 

p.s. there is a scenario though, which I'm covering with a python/jython custom 
script, that puzzles me... what if only a cell (row/column intersection) 
changes? HBase by design stores a new entry at a given timestamp for all the 
family:columns provided, even if they are identical ... shall we compute the 
difference within the HBaseStorage, or shall the user handle it?

> Support timestamp in HBaseStorage
> -
>
> Key: PIG-1832
> URL: https://issues.apache.org/jira/browse/PIG-1832
> Project: Pig
>  Issue Type: Improvement
> Environment: Java 6, Mac OS X 10.6
>Reporter: Eric Yang
>
> When storing data into HBase using 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage, HBase timestamp field is 
> stored with insertion time of the mapreduce job.  It would be nice to have a 
> way to populate timestamp from user data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1832) Support timestamp in HBaseStorage

2013-02-26 Thread Guido Serra aka Zeph (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13587180#comment-13587180
 ] 

Guido Serra aka Zeph commented on PIG-1832:
---

k, PIG-2886 is covering only the reading... this is actually attempting to 
cover the writing, let's keep it open

seems to be partially addressed in PIG-2114 though... [~eyang] any progress 
from ur side?

> Support timestamp in HBaseStorage
> -
>
> Key: PIG-1832
> URL: https://issues.apache.org/jira/browse/PIG-1832
> Project: Pig
>  Issue Type: Improvement
> Environment: Java 6, Mac OS X 10.6
>Reporter: Eric Yang
>
> When storing data into HBase using 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage, HBase timestamp field is 
> stored with insertion time of the mapreduce job.  It would be nice to have a 
> way to populate timestamp from user data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1832) Support timestamp in HBaseStorage

2013-02-26 Thread Guido Serra aka Zeph (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13587174#comment-13587174
 ] 

Guido Serra aka Zeph commented on PIG-1832:
---

[~eyang] up to me it is covered by PIG-2886 , have a look at it

> Support timestamp in HBaseStorage
> -
>
> Key: PIG-1832
> URL: https://issues.apache.org/jira/browse/PIG-1832
> Project: Pig
>  Issue Type: Improvement
> Environment: Java 6, Mac OS X 10.6
>Reporter: Eric Yang
>
> When storing data into HBase using 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage, HBase timestamp field is 
> stored with insertion time of the mapreduce job.  It would be nice to have a 
> way to populate timestamp from user data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-3206) HBaseStorage does not work with Oozie pig action and secure HBase

2013-02-26 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy resolved PIG-3206.
-

   Resolution: Fixed
Fix Version/s: 0.11.1

Thanks Dmitriy. Checked into 0.11.1 and trunk. Added a new section in 
CHANGES.txt for Release 0.11.1.

> HBaseStorage does not work with Oozie pig action and secure HBase
> -
>
> Key: PIG-3206
> URL: https://issues.apache.org/jira/browse/PIG-3206
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.10.1
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.12, 0.11.1
>
> Attachments: PIG-3206-1.patch
>
>
> HBaseStorage always tries to fetch delegation token for a secure hbase 
> cluster. But when pig is launched through Oozie, it will fail as TGT is not 
> available in the map job. In that case, it should try and reuse the hbase 
> delegation token in JobConf passed to pig through 
> mapreduce.job.credentials.binary property.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3214) New/improved mascot

2013-02-26 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13587161#comment-13587161
 ] 

Jonathan Coveney commented on PIG-3214:
---

Thanks for volunteering for this, Prashanth!

> New/improved mascot
> ---
>
> Key: PIG-3214
> URL: https://issues.apache.org/jira/browse/PIG-3214
> Project: Pig
>  Issue Type: Wish
>  Components: site
>Affects Versions: 0.11
>Reporter: Andrew Musselman
>Priority: Minor
> Fix For: 0.12
>
>
> Request to change pig mascot to something more graphically appealing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira