Re: Avro-456/457/458 in Google Summer of Code 2010

2010-04-06 Thread Scott Carey
I have very little knowledge about GSOC and do not know what is considered a 
project there.  

Avro doesn't define 'projects'.  Its JIRA tickets can be items that take an 
hour to do or a month.
Some of the tasks related to conversion of csv, xml, or json data into or out 
of avro are very simple, and some are not.  The scope of all three, including a 
unified tool for conversion and testing, is a fairly large overall project.  

JSON would be the easiest, since Avro already supports serialization to and 
from it.
XML has the potential to be the most complete and flexible.  It is a lot of 
work but would use standard APIs and tools.
CSV has the most restrictions -- it can't support recursive schemas or unions 
well.  Additionally it can have some tricky corner cases since it is not a 
strict standard.  It will likely be the most popular 'import into avro' 
variation, however. 
The tools for conversion (command line or other) and the testing required for 
all of the above are non-trivial, mostly because such a tool has to be very 
good at clear error handling and reporting or it won't be very useful in the 
real world.

-Scott

On Apr 5, 2010, at 6:03 PM, Zheng Yang wrote:

> Hi, Scott,
> 
> as you can see, these three tools can be eventually combined into one
> multifunctional tool which accepts different format(csv,xml,json..).
> since I'm also going to submit a proposal for this , may i know are they
> considered to be one project or three?
> 
> Yang
> 
> On Tue, Apr 6, 2010 at 7:43 AM, Scott Carey  wrote:
> 
>> FYI, it seems as though at least one other person has chosen a similar task
>> for GSOC:
>> 
>> 
>> 
>> http://mail-archives.apache.org/mod_mbox/hadoop-avro-dev/201003.mbox/%3c179519d11003212231y4537eb03i6f89eb3f6f745...@mail.gmail.com%3e
>> 
>> On Apr 5, 2010, at 4:31 PM, Jasintha Dasanayaka wrote:
>> 
>>> Hey..!
>>> I  am going to submit proposal for AVRO-457 can you do it AVRO-458 only
>>> 
>>> On Tue, Apr 6, 2010 at 4:00 AM, Hua Huang  wrote:
>>> 
 Hi all,
 
 
 
 This is Hua Huang, a CS master student from Simon Fraser University,
 Canada.
 I am going to participate in the Google Summer of Code 2010 and I also
>> find
 out that several projects of AVRO are quite interesting, especially
 AVRO-456(add tools that read/write json records from/to avro data files)
 together with AVRO-457 and AVRO-458.
 
 
 
 I plan to submit a proposal for these projects which would produce a
>> C/C++
 command-line tool to support transformation between AVRO data and other
 types of data, like CSV, Json or XML. My key idea is to use parallel bit
 stream technology to speed up the parsing procedure in order to build a
 high
 performance tool which will be very useful in practical, especially in
>> the
 large-scale dataset.
 
 
 
 I sent an email to Doug Cutting(cutt...@apache.org) who is the reporter
>> of
 these projects, but I haven't received any reply yet. So I am wondering,
>> is
 there anybody who can communicate with me for the details of the
>> projects,
 or even suggest me a person so that I could contact with him/her for the
 details?
 
 
 
 Any feedback is really appreciated. Thank you very much.
 
 
 
 Yours Sincerely,
 
 Hua
 
 
 
 
 
 
>>> 
>>> 
>>> --
>>> Jasintha Dasanayaka
>>> +94 772 916 596
>>> +94 472 232 139
>>> http://www.jasintha.info
>>> jasint...@gmail.com
>> 
>> 
> 
> 
> -- 
> School of Computing / Computing / Year 2
> National University of Singapore



Re: Avro-456/457/458 in Google Summer of Code 2010

2010-04-05 Thread Jasintha Dasanayaka
I think 3 projects

On Tue, Apr 6, 2010 at 6:33 AM, Zheng Yang  wrote:

> Hi, Scott,
>
> as you can see, these three tools can be eventually combined into one
> multifunctional tool which accepts different format(csv,xml,json..).
> since I'm also going to submit a proposal for this , may i know are they
> considered to be one project or three?
>
> Yang
>
> On Tue, Apr 6, 2010 at 7:43 AM, Scott Carey 
> wrote:
>
> > FYI, it seems as though at least one other person has chosen a similar
> task
> > for GSOC:
> >
> >
> >
> >
> http://mail-archives.apache.org/mod_mbox/hadoop-avro-dev/201003.mbox/%3c179519d11003212231y4537eb03i6f89eb3f6f745...@mail.gmail.com%3e
> >
> > On Apr 5, 2010, at 4:31 PM, Jasintha Dasanayaka wrote:
> >
> > > Hey..!
> > > I  am going to submit proposal for AVRO-457 can you do it AVRO-458 only
> > >
> > > On Tue, Apr 6, 2010 at 4:00 AM, Hua Huang  wrote:
> > >
> > >> Hi all,
> > >>
> > >>
> > >>
> > >> This is Hua Huang, a CS master student from Simon Fraser University,
> > >> Canada.
> > >> I am going to participate in the Google Summer of Code 2010 and I also
> > find
> > >> out that several projects of AVRO are quite interesting, especially
> > >> AVRO-456(add tools that read/write json records from/to avro data
> files)
> > >> together with AVRO-457 and AVRO-458.
> > >>
> > >>
> > >>
> > >> I plan to submit a proposal for these projects which would produce a
> > C/C++
> > >> command-line tool to support transformation between AVRO data and
> other
> > >> types of data, like CSV, Json or XML. My key idea is to use parallel
> bit
> > >> stream technology to speed up the parsing procedure in order to build
> a
> > >> high
> > >> performance tool which will be very useful in practical, especially in
> > the
> > >> large-scale dataset.
> > >>
> > >>
> > >>
> > >> I sent an email to Doug Cutting(cutt...@apache.org) who is the
> reporter
> > of
> > >> these projects, but I haven't received any reply yet. So I am
> wondering,
> > is
> > >> there anybody who can communicate with me for the details of the
> > projects,
> > >> or even suggest me a person so that I could contact with him/her for
> the
> > >> details?
> > >>
> > >>
> > >>
> > >> Any feedback is really appreciated. Thank you very much.
> > >>
> > >>
> > >>
> > >> Yours Sincerely,
> > >>
> > >> Hua
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >
> > >
> > > --
> > > Jasintha Dasanayaka
> > > +94 772 916 596
> > > +94 472 232 139
> > > http://www.jasintha.info
> > > jasint...@gmail.com
> >
> >
>
>
> --
> School of Computing / Computing / Year 2
> National University of Singapore
>



-- 
Jasintha Dasanayaka
+94 772 916 596
+94 472 232 139
http://www.jasintha.info
jasint...@gmail.com


Re: Avro-456/457/458 in Google Summer of Code 2010

2010-04-05 Thread Zheng Yang
Hi, Scott,

as you can see, these three tools can be eventually combined into one
multifunctional tool which accepts different format(csv,xml,json..).
since I'm also going to submit a proposal for this , may i know are they
considered to be one project or three?

Yang

On Tue, Apr 6, 2010 at 7:43 AM, Scott Carey  wrote:

> FYI, it seems as though at least one other person has chosen a similar task
> for GSOC:
>
>
>
> http://mail-archives.apache.org/mod_mbox/hadoop-avro-dev/201003.mbox/%3c179519d11003212231y4537eb03i6f89eb3f6f745...@mail.gmail.com%3e
>
> On Apr 5, 2010, at 4:31 PM, Jasintha Dasanayaka wrote:
>
> > Hey..!
> > I  am going to submit proposal for AVRO-457 can you do it AVRO-458 only
> >
> > On Tue, Apr 6, 2010 at 4:00 AM, Hua Huang  wrote:
> >
> >> Hi all,
> >>
> >>
> >>
> >> This is Hua Huang, a CS master student from Simon Fraser University,
> >> Canada.
> >> I am going to participate in the Google Summer of Code 2010 and I also
> find
> >> out that several projects of AVRO are quite interesting, especially
> >> AVRO-456(add tools that read/write json records from/to avro data files)
> >> together with AVRO-457 and AVRO-458.
> >>
> >>
> >>
> >> I plan to submit a proposal for these projects which would produce a
> C/C++
> >> command-line tool to support transformation between AVRO data and other
> >> types of data, like CSV, Json or XML. My key idea is to use parallel bit
> >> stream technology to speed up the parsing procedure in order to build a
> >> high
> >> performance tool which will be very useful in practical, especially in
> the
> >> large-scale dataset.
> >>
> >>
> >>
> >> I sent an email to Doug Cutting(cutt...@apache.org) who is the reporter
> of
> >> these projects, but I haven't received any reply yet. So I am wondering,
> is
> >> there anybody who can communicate with me for the details of the
> projects,
> >> or even suggest me a person so that I could contact with him/her for the
> >> details?
> >>
> >>
> >>
> >> Any feedback is really appreciated. Thank you very much.
> >>
> >>
> >>
> >> Yours Sincerely,
> >>
> >> Hua
> >>
> >>
> >>
> >>
> >>
> >>
> >
> >
> > --
> > Jasintha Dasanayaka
> > +94 772 916 596
> > +94 472 232 139
> > http://www.jasintha.info
> > jasint...@gmail.com
>
>


-- 
School of Computing / Computing / Year 2
National University of Singapore


Re: Avro-456/457/458 in Google Summer of Code 2010

2010-04-05 Thread Scott Carey
FYI, it seems as though at least one other person has chosen a similar task for 
GSOC:


http://mail-archives.apache.org/mod_mbox/hadoop-avro-dev/201003.mbox/%3c179519d11003212231y4537eb03i6f89eb3f6f745...@mail.gmail.com%3e

On Apr 5, 2010, at 4:31 PM, Jasintha Dasanayaka wrote:

> Hey..!
> I  am going to submit proposal for AVRO-457 can you do it AVRO-458 only
> 
> On Tue, Apr 6, 2010 at 4:00 AM, Hua Huang  wrote:
> 
>> Hi all,
>> 
>> 
>> 
>> This is Hua Huang, a CS master student from Simon Fraser University,
>> Canada.
>> I am going to participate in the Google Summer of Code 2010 and I also find
>> out that several projects of AVRO are quite interesting, especially
>> AVRO-456(add tools that read/write json records from/to avro data files)
>> together with AVRO-457 and AVRO-458.
>> 
>> 
>> 
>> I plan to submit a proposal for these projects which would produce a C/C++
>> command-line tool to support transformation between AVRO data and other
>> types of data, like CSV, Json or XML. My key idea is to use parallel bit
>> stream technology to speed up the parsing procedure in order to build a
>> high
>> performance tool which will be very useful in practical, especially in the
>> large-scale dataset.
>> 
>> 
>> 
>> I sent an email to Doug Cutting(cutt...@apache.org) who is the reporter of
>> these projects, but I haven't received any reply yet. So I am wondering, is
>> there anybody who can communicate with me for the details of the projects,
>> or even suggest me a person so that I could contact with him/her for the
>> details?
>> 
>> 
>> 
>> Any feedback is really appreciated. Thank you very much.
>> 
>> 
>> 
>> Yours Sincerely,
>> 
>> Hua
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 
> -- 
> Jasintha Dasanayaka
> +94 772 916 596
> +94 472 232 139
> http://www.jasintha.info
> jasint...@gmail.com



Re: Avro-456/457/458 in Google Summer of Code 2010

2010-04-05 Thread Jasintha Dasanayaka
Hey..!
I  am going to submit proposal for AVRO-457 can you do it AVRO-458 only

On Tue, Apr 6, 2010 at 4:00 AM, Hua Huang  wrote:

> Hi all,
>
>
>
> This is Hua Huang, a CS master student from Simon Fraser University,
> Canada.
> I am going to participate in the Google Summer of Code 2010 and I also find
> out that several projects of AVRO are quite interesting, especially
> AVRO-456(add tools that read/write json records from/to avro data files)
> together with AVRO-457 and AVRO-458.
>
>
>
> I plan to submit a proposal for these projects which would produce a C/C++
> command-line tool to support transformation between AVRO data and other
> types of data, like CSV, Json or XML. My key idea is to use parallel bit
> stream technology to speed up the parsing procedure in order to build a
> high
> performance tool which will be very useful in practical, especially in the
> large-scale dataset.
>
>
>
> I sent an email to Doug Cutting(cutt...@apache.org) who is the reporter of
> these projects, but I haven't received any reply yet. So I am wondering, is
> there anybody who can communicate with me for the details of the projects,
> or even suggest me a person so that I could contact with him/her for the
> details?
>
>
>
> Any feedback is really appreciated. Thank you very much.
>
>
>
> Yours Sincerely,
>
> Hua
>
>
>
>
>
>


-- 
Jasintha Dasanayaka
+94 772 916 596
+94 472 232 139
http://www.jasintha.info
jasint...@gmail.com