[jira] Assigned: (AVRO-517) Resolving Decoder fails in some cases

2010-04-15 Thread Thiruvalluvan M. G. (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvalluvan M. G. reassigned AVRO-517:


Assignee: Thiruvalluvan M. G.

> Resolving Decoder fails in some cases
> -
>
> Key: AVRO-517
> URL: https://issues.apache.org/jira/browse/AVRO-517
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.3.2
>Reporter: Scott Carey
>Assignee: Thiruvalluvan M. G.
>Priority: Critical
>
> User reports that reading an 'actual' schema of 
>  string, string, int
> fails when using an expected schema of:
>  string, string
> Sample code and details in the comments.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Questions re integrating Avro into Cascading process

2010-04-15 Thread Ken Krugler

Hi all,

We're looking at creating a Cascading Scheme for Avro, and have got a  
few questions below. These are very general, as this is more of a  
scoping phase (as in, are we crazy to try this) so apologies in  
advance for lack of detail.


For context, Cascading is an open source project that provides a  
workflow API on top of Hadoop. The key unit of data is a tuple, which  
corresponds to a record - you have fields (names) and values.  
Cascading uses a generalized "tap" concept for reading & writing  
tuples, where a tap uses a scheme to handle the low-level mapping from  
Cascading-land to/from the storage format.


So the goal here is to define a Cascading Scheme that will run on  
0.18.3 and later versions of Hadoop, and provide general support for  
reading/writing tuples from/to an Avro-format Hadoop part-x file.


We grabbed the recently committed AvroXXX code from  
org.apache.avro.mapred (thanks Doug & Scott), and began building the  
Cascading scheme to bridge between AvroWrapper keys and Cascading  
tuples.


1. What's the best approach if we want to dynamically define the Avro  
schema, based on a list of field names and types (classes)?


This assumes it's possible to dynamically define & use a schema, of  
course.


2. How much has the new Hadoop map-reduce support code been tested?

3. Will there be issues with running in 0.18.3, 0.19.2, etc?

I saw some discussion about Hadoop using the older Jackson 1.0.1 jar,  
and that then creating problems. Anything else?


4. The key integration point, besides the fields+classes to schema  
issue above, is mapping between Cascading tuples and AvroWrapper


If we're using (I assume) the generic format, any input on how we'd do  
this two-way conversion?


Thanks!

-- Ken


Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g






[jira] Created: (AVRO-518) make check in c++ is broken because of typo & missing boost_filesystem library

2010-04-15 Thread John Plevyak (JIRA)
make check in c++ is broken because of typo & missing boost_filesystem library
--

 Key: AVRO-518
 URL: https://issues.apache.org/jira/browse/AVRO-518
 Project: Avro
  Issue Type: Bug
  Components: c++
 Environment: linux w/boost 1.42
Reporter: John Plevyak
 Attachments: avro-cpp-buffer-jp-v1.patch

"make check" in c++ is broken because of typo & missing boost_filesystem 
library.

The typo is inverting BOOST and HAVE in 
api/buffer/detail/BufferDetailIterator.hh 

-#ifdef BOOST_HAVE_ASIO
+#ifdef HAVE_BOOST_ASIO

The missing library requires adding a new m4 macro.

I will include a patch.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (AVRO-518) make check in c++ is broken because of typo & missing boost_filesystem library

2010-04-15 Thread John Plevyak (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Plevyak updated AVRO-518:
--

Attachment: avro-cpp-buffer-jp-v1.patch

minimal patch to get "make check" to work on trunk

> make check in c++ is broken because of typo & missing boost_filesystem library
> --
>
> Key: AVRO-518
> URL: https://issues.apache.org/jira/browse/AVRO-518
> Project: Avro
>  Issue Type: Bug
>  Components: c++
> Environment: linux w/boost 1.42
>Reporter: John Plevyak
> Attachments: avro-cpp-buffer-jp-v1.patch
>
>
> "make check" in c++ is broken because of typo & missing boost_filesystem 
> library.
> The typo is inverting BOOST and HAVE in 
> api/buffer/detail/BufferDetailIterator.hh 
> -#ifdef BOOST_HAVE_ASIO
> +#ifdef HAVE_BOOST_ASIO
> The missing library requires adding a new m4 macro.
> I will include a patch.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (AVRO-517) Resolving Decoder fails in some cases

2010-04-15 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12857417#action_12857417
 ] 

Scott Carey commented on AVRO-517:
--

Sample code that shows this issue:

{code}
import java.io.File;
import java.io.IOException;

import org.apache.avro.Schema;
import org.apache.avro.file.DataFileReader;
import org.apache.avro.file.DataFileWriter;
import org.apache.avro.generic.GenericData;
import org.apache.avro.generic.GenericDatumReader;
import org.apache.avro.generic.GenericDatumWriter;
import org.apache.avro.generic.GenericData.Record;
import org.apache.avro.util.Utf8;

public class AddressBook {
String fileName = "AddressBook.db";
String prefix = "{\"type\":\"record\",\"name\": 
\"Person\",\"fields\":[";
String suffix = "]}";
String fieldFirst = "{\"name\":\"First\",\"type\":\"string\"}";
String fieldLast = "{\"name\":\"Last\",\"type\":\"string\"}";
String fieldAge = "{\"name\":\"Age\",\"type\":\"int\"}";
Schema personSchema = Schema.parse(prefix + fieldFirst + "," + 
fieldLast + ","  + fieldAge + suffix);
Schema ageSchema = Schema.parse(prefix + fieldAge + suffix);
Schema extractSchema = Schema.parse(prefix + fieldFirst + "," + 
fieldLast + suffix);
/**
 * @param args
 * @throws IOException
 */
public static void main(String[] args) throws IOException {
AddressBook ab = new AddressBook();
ab.init();
ab.browseAge();
ab.browseName();
}

public void init() throws IOException { 
DataFileWriter writer = new DataFileWriter(
new 
GenericDatumWriter(personSchema)).create(
personSchema, new 
File(fileName));
try {
writer.append(createPerson("Dante", "Hicks", 27));
writer.append(createPerson("Randal", "Graves", 20));
writer.append(createPerson("Steve", "Jobs", 31));
} finally {
writer.close();
}
}

private Record createPerson(String first, String last, int age) {
Record person = new GenericData.Record(personSchema);
person.put("First", new Utf8(first));
person.put("Last", new Utf8(last));
person.put("Age", age);
return person;
}

public void browseAge() throws IOException {
GenericDatumReader dr = new 
GenericDatumReader();
dr.setExpected(ageSchema);
DataFileReader reader = new DataFileReader(new 
File(
  fileName), dr);

try {
while (reader.hasNext()) {
Record person = reader.next();

System.out.println(person.get("Age").toString());
}
} finally {
reader.close();
}
}

public void browseName() throws IOException {   
GenericDatumReader dr = new 
GenericDatumReader();
dr.setExpected(extractSchema);
DataFileReader reader = new DataFileReader(new 
File(
  fileName), dr);

try {
while (reader.hasNext()) {
Record person = reader.next();

System.out.println(person.get("First").toString() + " " + 
person.get("Last").toString() + "\t");
}
} finally {
reader.close();
}
}
}
{code}


User comments:
{quote}
Hi,
27
20
31
Dante Hicks 
Exception in thread "main" org.apache.avro.AvroRuntimeException: 
java.io.EOFException
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:184)
at cn.znest.test.avro.AddressBook.browseName(AddressBook.java:91)
at cn.znest.test.avro.AddressBook.main(AddressBook.java:43)
Caused by: java.io.EOFException
at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:163)
at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:262)
at 
org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:93)
at 
org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:277)
at 
org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:271)
at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:83)
at 
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:105)
at 
org.apach

[jira] Created: (AVRO-517) Resolving Decoder fails in some cases

2010-04-15 Thread Scott Carey (JIRA)
Resolving Decoder fails in some cases
-

 Key: AVRO-517
 URL: https://issues.apache.org/jira/browse/AVRO-517
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.3.2
Reporter: Scott Carey
Priority: Critical


User reports that reading an 'actual' schema of 
 string, string, int
fails when using an expected schema of:
 string, string

Sample code and details in the comments.


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira