[jira] [Commented] (PIG-3015) Rewrite of AvroStorage

2013-03-12 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13600648#comment-13600648
 ] 

Jakob Homan commented on PIG-3015:
--

bq. Serious question: is there a reason to put this in Pig rather than keep 
elsewhere, where you can iterate without being tied to Pig's release cycle?
Having tried that with the Avro Serde/Haivvreo, I'd say the code is better 
treated as part of Hive since it's wasn't getting the correct amount of 
attention it deserved in github.  There's a definite cost to keeping the 
components in sync, but there's a strong benefit to making it easy for people 
to interact with Avro through Pig right out of the box.

> Rewrite of AvroStorage
> --
>
> Key: PIG-3015
> URL: https://issues.apache.org/jira/browse/PIG-3015
> Project: Pig
>  Issue Type: Improvement
>  Components: piggybank
>Reporter: Joseph Adler
>Assignee: Joseph Adler
> Attachments: bad.avro, good.avro, PIG-3015-10.patch, 
> PIG-3015-11.patch, PIG-3015-2.patch, PIG-3015-3.patch, PIG-3015-4.patch, 
> PIG-3015-5.patch, PIG-3015-6.patch, PIG-3015-7.patch, PIG-3015-9.patch, 
> PIG-3015-doc-2.patch, PIG-3015-doc.patch, TestInput.java, Test.java, 
> with_dates.pig
>
>
> The current AvroStorage implementation has a lot of issues: it requires old 
> versions of Avro, it copies data much more than needed, and it's verbose and 
> complicated. (One pet peeve of mine is that old versions of Avro don't 
> support Snappy compression.)
> I rewrote AvroStorage from scratch to fix these issues. In early tests, the 
> new implementation is significantly faster, and the code is a lot simpler. 
> Rewriting AvroStorage also enabled me to implement support for Trevni (as 
> TrevniStorage).
> I'm opening this ticket to facilitate discussion while I figure out the best 
> way to contribute the changes back to Apache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1748) Add load/store function AvroStorage for avro data

2012-09-13 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13454985#comment-13454985
 ] 

Jakob Homan commented on PIG-1748:
--

@deb - questions like these should be directed to the pig user list, not JIRA.  
You'll receive assistance there.

> Add load/store function AvroStorage for avro data
> -
>
> Key: PIG-1748
> URL: https://issues.apache.org/jira/browse/PIG-1748
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: lin guo
>Assignee: lin guo
> Fix For: 0.9.0
>
> Attachments: avro_storage.patch, AvroStorageUtils-bagfix.patch, 
> avro_test_files.tar.gz, PIG-1748-2.patch, PIG-1748-3.patch
>
>
> We want to use Pig to process arbitrary Avro data and store results as Avro 
> files. AvroStorage() extends two PigFuncs: LoadFunc and StoreFunc. 
> Due to discrepancies of Avro and Pig data models, AvroStorage has:
> 1. Limited support for "record": we do not support recursively defined record 
> because the number of fields in such records is data dependent.
> 2. Limited support for "union": we only accept nullable union like ["null", 
> "some-type"].
> For simplicity, we also make the following assumptions:
> If the input directory is a leaf directory, then we assume Avro data files in 
> it have the same schema;
> If the input directory contains sub-directories, then we assume Avro data 
> files in all sub-directories have the same schema.
> AvroStorage takes no input parameters when used as a LoadFunc (except for 
> "debug [debug-level]"). 
> Users can provide parameters to AvroStorage when used as a StoreFunc. If they 
> don't, Avro schema of output data is derived from its 
> Pig schema.
> Detailed documentation can be found in 
> http://linkedin.jira.com/wiki/display/HTOOLS/AvroStorage+-+Pig+support+for+Avro+data

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1891) Enable StoreFunc to make intelligent decision based on job success or failure

2012-08-06 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429456#comment-13429456
 ] 

Jakob Homan commented on PIG-1891:
--

This looks good to me.  +1 on the patch, for what it's worth.  This is what 
we're looking for.  [~billgraham], how does this look to you?

> Enable StoreFunc to make intelligent decision based on job success or failure
> -
>
> Key: PIG-1891
> URL: https://issues.apache.org/jira/browse/PIG-1891
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.10.0
>Reporter: Alex Rovner
>Priority: Minor
>  Labels: patch
> Attachments: PIG-1891-1.patch
>
>
> We are in the process of using PIG for various data processing and component 
> integration. Here is where we feel pig storage funcs lack:
> They are not aware if the over all job has succeeded. This creates a problem 
> for storage funcs which needs to "upload" results into another system:
> DB, FTP, another file system etc.
> I looked at the DBStorage in the piggybank 
> (http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/DBStorage.java?view=markup)
>  and what I see is essentially a mechanism which for each task does the 
> following:
> 1. Creates a recordwriter (in this case open connection to db)
> 2. Open transaction.
> 3. Writes records into a batch
> 4. Executes commit or rollback depending if the task was successful.
> While this aproach works great on a task level, it does not work at all on a 
> job level. 
> If certain tasks will succeed but over job will fail, partial records are 
> going to get uploaded into the DB.
> Any ideas on the workaround? 
> Our current workaround is fairly ugly: We created a java wrapper that 
> launches pig jobs and then uploads to DB's once pig's job is successful. 
> While the approach works, it's not really integrated into pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-2031) NPE in TOP

2011-08-30 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated PIG-2031:
-

Assignee: Jacob Perkins

> NPE in TOP
> --
>
> Key: PIG-2031
> URL: https://issues.apache.org/jira/browse/PIG-2031
> Project: Pig
>  Issue Type: Bug
>Reporter: Jacob Perkins
>Assignee: Jacob Perkins
>  Labels: newbie
> Attachments: toppatch.txt
>
>
> If a NULL DataBag is passed to org.apache.pig.builtin.TOP then a NPE is 
> thrown. Consider:
> {code}
> $: cat foo.tsv
> a  {(foo,1),(bar,2)}
> b
> c  {(fyha,4),(asdf,9)}
> {code}
> then:
> {code}
> data  = LOAD 'foo.tsv' AS (key:chararray, a_bag:bag {t:tuple (name:chararray, 
> value:int)});
> tpd   = FOREACH data {
>   top_n = TOP(1, 1, a_bag);
>   GENERATE
> key   AS key,
> top_n AS top_n
>   ; 
> };
> DUMP tpd;
> {code}
> will throw an NPE when it gets to the row with no bag.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (PIG-1748) Add load/store function AvroStorage for avro data

2011-07-05 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated PIG-1748:
-

Assignee: lin guo  (was: Jakob Homan)

> Add load/store function AvroStorage for avro data
> -
>
> Key: PIG-1748
> URL: https://issues.apache.org/jira/browse/PIG-1748
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: lin guo
>Assignee: lin guo
> Fix For: 0.9.0
>
> Attachments: PIG-1748-2.patch, PIG-1748-3.patch, avro_storage.patch, 
> avro_test_files.tar.gz
>
>
> We want to use Pig to process arbitrary Avro data and store results as Avro 
> files. AvroStorage() extends two PigFuncs: LoadFunc and StoreFunc. 
> Due to discrepancies of Avro and Pig data models, AvroStorage has:
> 1. Limited support for "record": we do not support recursively defined record 
> because the number of fields in such records is data dependent.
> 2. Limited support for "union": we only accept nullable union like ["null", 
> "some-type"].
> For simplicity, we also make the following assumptions:
> If the input directory is a leaf directory, then we assume Avro data files in 
> it have the same schema;
> If the input directory contains sub-directories, then we assume Avro data 
> files in all sub-directories have the same schema.
> AvroStorage takes no input parameters when used as a LoadFunc (except for 
> "debug [debug-level]"). 
> Users can provide parameters to AvroStorage when used as a StoreFunc. If they 
> don't, Avro schema of output data is derived from its 
> Pig schema.
> Detailed documentation can be found in 
> http://linkedin.jira.com/wiki/display/HTOOLS/AvroStorage+-+Pig+support+for+Avro+data

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (PIG-1890) Fix piggybank unit test TestAvroStorage

2011-05-09 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030845#comment-13030845
 ] 

Jakob Homan commented on PIG-1890:
--

@Ken - any update now that we're in a new week?

> Fix piggybank unit test TestAvroStorage
> ---
>
> Key: PIG-1890
> URL: https://issues.apache.org/jira/browse/PIG-1890
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.9.0
>Reporter: Daniel Dai
>Assignee: Jakob Homan
> Fix For: 0.9.0
>
> Attachments: PIG-1890-1.patch
>
>
> TestAvroStorage fail on trunk. There are two reasons:
> 1. After PIG-1680, we call LoadFunc.setLocation one more time.
> 2. The schema for AvroStorage seems to be wrong. For example, in first test 
> case testArrayDefault, the schema for "in" is set to "PIG_WRAPPER: (FIELD: 
> {PIG_WRAPPER: (ARRAY_ELEM: float)})". It seems PIG_WRAPPER is redundant. This 
> issue is hidden until PIG-1188 checked in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (PIG-1872) Fix bug in AvroStorage

2011-02-25 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated PIG-1872:
-

Priority: Major  (was: Minor)

> Fix bug in AvroStorage
> --
>
> Key: PIG-1872
> URL: https://issues.apache.org/jira/browse/PIG-1872
> Project: Pig
>  Issue Type: Bug
>Affects Versions: site
>Reporter: lin guo
> Fix For: site
>
> Attachments: my.diff
>
>
> AvroStorageUtils.containsRecursiveRecord() has a bug and returns true for a 
> record with multiple fields of the same type, e.g.
>  { "type":"record", "name":"Event", " +
> "fields":[{"name":"f1", "type":{ "type":"record","name":"EntityID", }}
>   {"name":"f2","type":"EntityID"}, " +
>   {"name":"f3","type":"EntityID"} ]}
>
> Patch contains bug fix and unit tests.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (PIG-1872) Fix bug in AvroStorage

2011-02-25 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated PIG-1872:
-

Fix Version/s: (was: 0.9.0)
   site
Affects Version/s: (was: 0.9.0)
   site
   Status: Patch Available  (was: Open)

> Fix bug in AvroStorage
> --
>
> Key: PIG-1872
> URL: https://issues.apache.org/jira/browse/PIG-1872
> Project: Pig
>  Issue Type: Bug
>Affects Versions: site
>Reporter: lin guo
>Priority: Minor
> Fix For: site
>
> Attachments: my.diff
>
>
> AvroStorageUtils.containsRecursiveRecord() has a bug and returns true for a 
> record with multiple fields of the same type, e.g.
>  { "type":"record", "name":"Event", " +
> "fields":[{"name":"f1", "type":{ "type":"record","name":"EntityID", }}
>   {"name":"f2","type":"EntityID"}, " +
>   {"name":"f3","type":"EntityID"} ]}
>
> Patch contains bug fix and unit tests.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (PIG-1872) Fix bug in AvroStorage

2011-02-25 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12999689#comment-12999689
 ] 

Jakob Homan commented on PIG-1872:
--

+1.  Looks good to me.

> Fix bug in AvroStorage
> --
>
> Key: PIG-1872
> URL: https://issues.apache.org/jira/browse/PIG-1872
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.9.0
>Reporter: lin guo
>Priority: Minor
> Fix For: 0.9.0
>
> Attachments: my.diff
>
>
> AvroStorageUtils.containsRecursiveRecord() has a bug and returns true for a 
> record with multiple fields of the same type, e.g.
>  { "type":"record", "name":"Event", " +
> "fields":[{"name":"f1", "type":{ "type":"record","name":"EntityID", }}
>   {"name":"f2","type":"EntityID"}, " +
>   {"name":"f3","type":"EntityID"} ]}
>
> Patch contains bug fix and unit tests.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (PIG-1833) Contrib's build.xml points to an invalid hadoop-conf

2011-01-28 Thread Jakob Homan (JIRA)
Contrib's build.xml points to an invalid hadoop-conf


 Key: PIG-1833
 URL: https://issues.apache.org/jira/browse/PIG-1833
 Project: Pig
  Issue Type: Bug
Reporter: Jakob Homan


As discovered in testing PIG-1748, the build.xml in the contrib/piggybank/java 
module has {{junit.hadoop..conf}} which points to 
{{"${user.home}/pigtest/conf/"}}.  In this directory is a hadoop-conf.xml that 
defines a value for {{fs.default.name}} which is valid during the regular test 
runs but not for the contrib modules.  However, any tests in contrib that try 
to access a non-fully qualified file via FileSystem will be routed to this 
value and will then fail when they can't reach it.  If, however, one runs the 
tests directly from contrib module without the pigtest directory existing, the 
tests will pass.  Do any of the contrib modules actually need this variable?  
If not, it should be removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1748) Add load/store function AvroStorage for avro data

2011-01-28 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated PIG-1748:
-

Attachment: PIG-1748-3.patch

Figured out the test failures.  Turns out that when one does a full run of the 
unit tests (which I cannot get to succeed on my machine), the ~/pigtest 
directory is left running during the contrib tests and within the contrib 
build.xml file is a {{junit.hadoop.conf}} variable pointing those tests to the 
hdfs the pig tests had running but is no longer up.  This conf trickles down to 
the test which ends up using it as the default filesystem and tries to connect 
to it, but can't since that HDFS is gone.  This doesn't occur when run through 
an idea like IntelliJ since the IDE doesn't use contrib's build.xml settings.  

I've fixed this by explicitly referencing the local file system in the tests, 
though this seems like a bug in the contrib build system to me.  I'll open a 
JIRA to address this.

@Felix - good catch.  To provide a cleaner separation between my work and 
Lin's, I would like to go ahead and fix this bug in a separate JIRA after 1748 
is committed.  How does this sound to you?

Contrib tests pass, except org.apache.pig.piggybank.test.TestPigStorageSchema, 
which fails for me with or without the patch.  Version 3 of the patch is 
updated to include better behavior in for directories with files that should be 
filtered out.

> Add load/store function AvroStorage for avro data
> -
>
> Key: PIG-1748
> URL: https://issues.apache.org/jira/browse/PIG-1748
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: lin guo
>Assignee: Jakob Homan
> Attachments: avro_storage.patch, avro_test_files.tar.gz, 
> PIG-1748-2.patch, PIG-1748-3.patch
>
>
> We want to use Pig to process arbitrary Avro data and store results as Avro 
> files. AvroStorage() extends two PigFuncs: LoadFunc and StoreFunc. 
> Due to discrepancies of Avro and Pig data models, AvroStorage has:
> 1. Limited support for "record": we do not support recursively defined record 
> because the number of fields in such records is data dependent.
> 2. Limited support for "union": we only accept nullable union like ["null", 
> "some-type"].
> For simplicity, we also make the following assumptions:
> If the input directory is a leaf directory, then we assume Avro data files in 
> it have the same schema;
> If the input directory contains sub-directories, then we assume Avro data 
> files in all sub-directories have the same schema.
> AvroStorage takes no input parameters when used as a LoadFunc (except for 
> "debug [debug-level]"). 
> Users can provide parameters to AvroStorage when used as a StoreFunc. If they 
> don't, Avro schema of output data is derived from its 
> Pig schema.
> Detailed documentation can be found in 
> http://snaprojects.jira.com/wiki/display/HTOOLS/AvroStorage+-+Pig+support+for+Avro+data

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1748) Add load/store function AvroStorage for avro data

2011-01-24 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985972#action_12985972
 ] 

Jakob Homan commented on PIG-1748:
--

@Scott
I can't say I'm convinced, and am in fact more concerned from your example, 
given that this approach essentially builds dependencies on all of those 
projects into Avro.  However, this JIRA isn't the best place to discuss this.  
Is there a discussion about this type of integration going on in Avro that the 
community can contribute to?  Is there a JIRA?  Thanks.

> Add load/store function AvroStorage for avro data
> -
>
> Key: PIG-1748
> URL: https://issues.apache.org/jira/browse/PIG-1748
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: lin guo
>Assignee: Jakob Homan
> Attachments: avro_storage.patch, avro_test_files.tar.gz, 
> PIG-1748-2.patch
>
>
> We want to use Pig to process arbitrary Avro data and store results as Avro 
> files. AvroStorage() extends two PigFuncs: LoadFunc and StoreFunc. 
> Due to discrepancies of Avro and Pig data models, AvroStorage has:
> 1. Limited support for "record": we do not support recursively defined record 
> because the number of fields in such records is data dependent.
> 2. Limited support for "union": we only accept nullable union like ["null", 
> "some-type"].
> For simplicity, we also make the following assumptions:
> If the input directory is a leaf directory, then we assume Avro data files in 
> it have the same schema;
> If the input directory contains sub-directories, then we assume Avro data 
> files in all sub-directories have the same schema.
> AvroStorage takes no input parameters when used as a LoadFunc (except for 
> "debug [debug-level]"). 
> Users can provide parameters to AvroStorage when used as a StoreFunc. If they 
> don't, Avro schema of output data is derived from its 
> Pig schema.
> Detailed documentation can be found in 
> http://snaprojects.jira.com/wiki/display/HTOOLS/AvroStorage+-+Pig+support+for+Avro+data

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1748) Add load/store function AvroStorage for avro data

2011-01-24 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985936#action_12985936
 ] 

Jakob Homan commented on PIG-1748:
--

@Daniel- Let me take a look.

@Scott - It's worth noting that projects can include Avro support as they wish, 
just as Avro can incorporate that work as it wishes.  But I'm not sure I 
understand.  You're saying that you'd rather have any higher-level application 
supporting Avro to have that support hosted in Avro, rather than treating it as 
a library to be included?  This seems like an odd approach to me, essentially 
inverting the domain knowledge of each application to Avro, rather than the 
application itself where its developers frolic and work.  Is there something 
I'm missing here?

> Add load/store function AvroStorage for avro data
> -
>
> Key: PIG-1748
> URL: https://issues.apache.org/jira/browse/PIG-1748
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: lin guo
>Assignee: Jakob Homan
> Attachments: avro_storage.patch, avro_test_files.tar.gz, 
> PIG-1748-2.patch
>
>
> We want to use Pig to process arbitrary Avro data and store results as Avro 
> files. AvroStorage() extends two PigFuncs: LoadFunc and StoreFunc. 
> Due to discrepancies of Avro and Pig data models, AvroStorage has:
> 1. Limited support for "record": we do not support recursively defined record 
> because the number of fields in such records is data dependent.
> 2. Limited support for "union": we only accept nullable union like ["null", 
> "some-type"].
> For simplicity, we also make the following assumptions:
> If the input directory is a leaf directory, then we assume Avro data files in 
> it have the same schema;
> If the input directory contains sub-directories, then we assume Avro data 
> files in all sub-directories have the same schema.
> AvroStorage takes no input parameters when used as a LoadFunc (except for 
> "debug [debug-level]"). 
> Users can provide parameters to AvroStorage when used as a StoreFunc. If they 
> don't, Avro schema of output data is derived from its 
> Pig schema.
> Detailed documentation can be found in 
> http://snaprojects.jira.com/wiki/display/HTOOLS/AvroStorage+-+Pig+support+for+Avro+data

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1816) Piggybank's Hive loaders should rely on Ivy rather than downloading Hive manually

2011-01-20 Thread Jakob Homan (JIRA)
Piggybank's Hive loaders should rely on Ivy rather than downloading Hive 
manually
-

 Key: PIG-1816
 URL: https://issues.apache.org/jira/browse/PIG-1816
 Project: Pig
  Issue Type: Improvement
Reporter: Jakob Homan


Currently the Hive components in Piggybank download, extra, rename and copy the 
necessary Hive jars manually in Ant via the {{download-hive-deps}} target 
(incidentally, it's downloading 0.4, an ancient version). This plays havoc with 
IDEs that may be trying to load the Hive classes and resolve dependencies 
automatically through Hive, and is also fragile.  This component should be 
updated to use Ivy to resolve the Hive dependency.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1749) Update Pig parser so that function arguments can contain newline characters

2011-01-19 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12983980#action_12983980
 ] 

Jakob Homan commented on PIG-1749:
--

Also, regarding accepting newlines in places other than function args, I'd like 
to handle that in a different JIRA.  It seems intuitive to me to be able to do 
so in function args because they're essentially strings being passed in.  Other 
locations, like those mentioned on RB are more debatable and should be aired 
out in a more visible JIRA.

> Update Pig parser so that function arguments can contain newline characters
> ---
>
> Key: PIG-1749
> URL: https://issues.apache.org/jira/browse/PIG-1749
> Project: Pig
>  Issue Type: Improvement
>Reporter: lin guo
> Attachments: parser.patch, PIG-1749-2.patch
>
>
> We want to add this feature so that users can put long function argument 
> strings in multiple lines. PIG-1748 depends on this. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1749) Update Pig parser so that function arguments can contain newline characters

2011-01-19 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated PIG-1749:
-

Attachment: PIG-1749-2.patch

I'll be finishing this patch for Lin.  Updated patch.  Added test for 
TestQueryParser, which required updating the new grammar (my grammar skills are 
a bity rusty), removed reference to AvroStorage, but left in its details.  
These references don't actually call out to the class and are just used for 
filler purposes.  

> Update Pig parser so that function arguments can contain newline characters
> ---
>
> Key: PIG-1749
> URL: https://issues.apache.org/jira/browse/PIG-1749
> Project: Pig
>  Issue Type: Improvement
>Reporter: lin guo
> Attachments: parser.patch, PIG-1749-2.patch
>
>
> We want to add this feature so that users can put long function argument 
> strings in multiple lines. PIG-1748 depends on this. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1748) Add load/store function AvroStorage for avro data

2011-01-19 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated PIG-1748:
-

Attachment: PIG-1748-2.patch

Attaching updated patch.  I'll be finishing this JIRA for Lin.  Delta from her 
patch: Avro 1.4, test refactoring and brought in line with Pig coding 
conventions.  Ready for review.

> Add load/store function AvroStorage for avro data
> -
>
> Key: PIG-1748
> URL: https://issues.apache.org/jira/browse/PIG-1748
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: lin guo
> Attachments: avro_storage.patch, avro_test_files.tar.gz, 
> PIG-1748-2.patch
>
>
> We want to use Pig to process arbitrary Avro data and store results as Avro 
> files. AvroStorage() extends two PigFuncs: LoadFunc and StoreFunc. 
> Due to discrepancies of Avro and Pig data models, AvroStorage has:
> 1. Limited support for "record": we do not support recursively defined record 
> because the number of fields in such records is data dependent.
> 2. Limited support for "union": we only accept nullable union like ["null", 
> "some-type"].
> For simplicity, we also make the following assumptions:
> If the input directory is a leaf directory, then we assume Avro data files in 
> it have the same schema;
> If the input directory contains sub-directories, then we assume Avro data 
> files in all sub-directories have the same schema.
> AvroStorage takes no input parameters when used as a LoadFunc (except for 
> "debug [debug-level]"). 
> Users can provide parameters to AvroStorage when used as a StoreFunc. If they 
> don't, Avro schema of output data is derived from its 
> Pig schema.
> Detailed documentation can be found in 
> http://snaprojects.jira.com/wiki/display/HTOOLS/AvroStorage+-+Pig+support+for+Avro+data

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1748) Add load/store function AvroStorage for avro data

2011-01-19 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated PIG-1748:
-

Attachment: avro_test_files.tar.gz

Attaching binary test avro files used by unit tests.  Need to be untgz'ed and 
placed in 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files
 by reviewer/committer

> Add load/store function AvroStorage for avro data
> -
>
> Key: PIG-1748
> URL: https://issues.apache.org/jira/browse/PIG-1748
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: lin guo
> Attachments: avro_storage.patch, avro_test_files.tar.gz, 
> PIG-1748-2.patch
>
>
> We want to use Pig to process arbitrary Avro data and store results as Avro 
> files. AvroStorage() extends two PigFuncs: LoadFunc and StoreFunc. 
> Due to discrepancies of Avro and Pig data models, AvroStorage has:
> 1. Limited support for "record": we do not support recursively defined record 
> because the number of fields in such records is data dependent.
> 2. Limited support for "union": we only accept nullable union like ["null", 
> "some-type"].
> For simplicity, we also make the following assumptions:
> If the input directory is a leaf directory, then we assume Avro data files in 
> it have the same schema;
> If the input directory contains sub-directories, then we assume Avro data 
> files in all sub-directories have the same schema.
> AvroStorage takes no input parameters when used as a LoadFunc (except for 
> "debug [debug-level]"). 
> Users can provide parameters to AvroStorage when used as a StoreFunc. If they 
> don't, Avro schema of output data is derived from its 
> Pig schema.
> Detailed documentation can be found in 
> http://snaprojects.jira.com/wiki/display/HTOOLS/AvroStorage+-+Pig+support+for+Avro+data

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.