Re: Avro-456/457/458 in Google Summer of Code 2010

2010-04-06 Thread Scott Carey
I have very little knowledge about GSOC and do not know what is considered a 
project there.  

Avro doesn't define 'projects'.  Its JIRA tickets can be items that take an 
hour to do or a month.
Some of the tasks related to conversion of csv, xml, or json data into or out 
of avro are very simple, and some are not.  The scope of all three, including a 
unified tool for conversion and testing, is a fairly large overall project.  

JSON would be the easiest, since Avro already supports serialization to and 
from it.
XML has the potential to be the most complete and flexible.  It is a lot of 
work but would use standard APIs and tools.
CSV has the most restrictions -- it can't support recursive schemas or unions 
well.  Additionally it can have some tricky corner cases since it is not a 
strict standard.  It will likely be the most popular 'import into avro' 
variation, however. 
The tools for conversion (command line or other) and the testing required for 
all of the above are non-trivial, mostly because such a tool has to be very 
good at clear error handling and reporting or it won't be very useful in the 
real world.

-Scott

On Apr 5, 2010, at 6:03 PM, Zheng Yang wrote:

 Hi, Scott,
 
 as you can see, these three tools can be eventually combined into one
 multifunctional tool which accepts different format(csv,xml,json..).
 since I'm also going to submit a proposal for this , may i know are they
 considered to be one project or three?
 
 Yang
 
 On Tue, Apr 6, 2010 at 7:43 AM, Scott Carey sc...@richrelevance.com wrote:
 
 FYI, it seems as though at least one other person has chosen a similar task
 for GSOC:
 
 
 
 http://mail-archives.apache.org/mod_mbox/hadoop-avro-dev/201003.mbox/%3c179519d11003212231y4537eb03i6f89eb3f6f745...@mail.gmail.com%3e
 
 On Apr 5, 2010, at 4:31 PM, Jasintha Dasanayaka wrote:
 
 Hey..!
 I  am going to submit proposal for AVRO-457 can you do it AVRO-458 only
 
 On Tue, Apr 6, 2010 at 4:00 AM, Hua Huang h...@sfu.ca wrote:
 
 Hi all,
 
 
 
 This is Hua Huang, a CS master student from Simon Fraser University,
 Canada.
 I am going to participate in the Google Summer of Code 2010 and I also
 find
 out that several projects of AVRO are quite interesting, especially
 AVRO-456(add tools that read/write json records from/to avro data files)
 together with AVRO-457 and AVRO-458.
 
 
 
 I plan to submit a proposal for these projects which would produce a
 C/C++
 command-line tool to support transformation between AVRO data and other
 types of data, like CSV, Json or XML. My key idea is to use parallel bit
 stream technology to speed up the parsing procedure in order to build a
 high
 performance tool which will be very useful in practical, especially in
 the
 large-scale dataset.
 
 
 
 I sent an email to Doug Cutting(cutt...@apache.org) who is the reporter
 of
 these projects, but I haven't received any reply yet. So I am wondering,
 is
 there anybody who can communicate with me for the details of the
 projects,
 or even suggest me a person so that I could contact with him/her for the
 details?
 
 
 
 Any feedback is really appreciated. Thank you very much.
 
 
 
 Yours Sincerely,
 
 Hua
 
 
 
 
 
 
 
 
 --
 Jasintha Dasanayaka
 +94 772 916 596
 +94 472 232 139
 http://www.jasintha.info
 jasint...@gmail.com
 
 
 
 
 -- 
 School of Computing / Computing / Year 2
 National University of Singapore



Avro-456/457/458 in Google Summer of Code 2010

2010-04-05 Thread Hua Huang
Hi all,

 

This is Hua Huang, a CS master student from Simon Fraser University, Canada.
I am going to participate in the Google Summer of Code 2010 and I also find
out that several projects of AVRO are quite interesting, especially
AVRO-456(add tools that read/write json records from/to avro data files)
together with AVRO-457 and AVRO-458. 

 

I plan to submit a proposal for these projects which would produce a C/C++
command-line tool to support transformation between AVRO data and other
types of data, like CSV, Json or XML. My key idea is to use parallel bit
stream technology to speed up the parsing procedure in order to build a high
performance tool which will be very useful in practical, especially in the
large-scale dataset.

 

I sent an email to Doug Cutting(cutt...@apache.org) who is the reporter of
these projects, but I haven't received any reply yet. So I am wondering, is
there anybody who can communicate with me for the details of the projects,
or even suggest me a person so that I could contact with him/her for the
details?

 

Any feedback is really appreciated. Thank you very much.

 

Yours Sincerely,

Hua

 

 



Re: Avro-456/457/458 in Google Summer of Code 2010

2010-04-05 Thread Jasintha Dasanayaka
Hey..!
I  am going to submit proposal for AVRO-457 can you do it AVRO-458 only

On Tue, Apr 6, 2010 at 4:00 AM, Hua Huang h...@sfu.ca wrote:

 Hi all,



 This is Hua Huang, a CS master student from Simon Fraser University,
 Canada.
 I am going to participate in the Google Summer of Code 2010 and I also find
 out that several projects of AVRO are quite interesting, especially
 AVRO-456(add tools that read/write json records from/to avro data files)
 together with AVRO-457 and AVRO-458.



 I plan to submit a proposal for these projects which would produce a C/C++
 command-line tool to support transformation between AVRO data and other
 types of data, like CSV, Json or XML. My key idea is to use parallel bit
 stream technology to speed up the parsing procedure in order to build a
 high
 performance tool which will be very useful in practical, especially in the
 large-scale dataset.



 I sent an email to Doug Cutting(cutt...@apache.org) who is the reporter of
 these projects, but I haven't received any reply yet. So I am wondering, is
 there anybody who can communicate with me for the details of the projects,
 or even suggest me a person so that I could contact with him/her for the
 details?



 Any feedback is really appreciated. Thank you very much.



 Yours Sincerely,

 Hua








-- 
Jasintha Dasanayaka
+94 772 916 596
+94 472 232 139
http://www.jasintha.info
jasint...@gmail.com


Re: Avro-456/457/458 in Google Summer of Code 2010

2010-04-05 Thread Scott Carey
FYI, it seems as though at least one other person has chosen a similar task for 
GSOC:


http://mail-archives.apache.org/mod_mbox/hadoop-avro-dev/201003.mbox/%3c179519d11003212231y4537eb03i6f89eb3f6f745...@mail.gmail.com%3e

On Apr 5, 2010, at 4:31 PM, Jasintha Dasanayaka wrote:

 Hey..!
 I  am going to submit proposal for AVRO-457 can you do it AVRO-458 only
 
 On Tue, Apr 6, 2010 at 4:00 AM, Hua Huang h...@sfu.ca wrote:
 
 Hi all,
 
 
 
 This is Hua Huang, a CS master student from Simon Fraser University,
 Canada.
 I am going to participate in the Google Summer of Code 2010 and I also find
 out that several projects of AVRO are quite interesting, especially
 AVRO-456(add tools that read/write json records from/to avro data files)
 together with AVRO-457 and AVRO-458.
 
 
 
 I plan to submit a proposal for these projects which would produce a C/C++
 command-line tool to support transformation between AVRO data and other
 types of data, like CSV, Json or XML. My key idea is to use parallel bit
 stream technology to speed up the parsing procedure in order to build a
 high
 performance tool which will be very useful in practical, especially in the
 large-scale dataset.
 
 
 
 I sent an email to Doug Cutting(cutt...@apache.org) who is the reporter of
 these projects, but I haven't received any reply yet. So I am wondering, is
 there anybody who can communicate with me for the details of the projects,
 or even suggest me a person so that I could contact with him/her for the
 details?
 
 
 
 Any feedback is really appreciated. Thank you very much.
 
 
 
 Yours Sincerely,
 
 Hua
 
 
 
 
 
 
 
 
 -- 
 Jasintha Dasanayaka
 +94 772 916 596
 +94 472 232 139
 http://www.jasintha.info
 jasint...@gmail.com



Re: Avro-456/457/458 in Google Summer of Code 2010

2010-04-05 Thread Zheng Yang
Hi, Scott,

as you can see, these three tools can be eventually combined into one
multifunctional tool which accepts different format(csv,xml,json..).
since I'm also going to submit a proposal for this , may i know are they
considered to be one project or three?

Yang

On Tue, Apr 6, 2010 at 7:43 AM, Scott Carey sc...@richrelevance.com wrote:

 FYI, it seems as though at least one other person has chosen a similar task
 for GSOC:



 http://mail-archives.apache.org/mod_mbox/hadoop-avro-dev/201003.mbox/%3c179519d11003212231y4537eb03i6f89eb3f6f745...@mail.gmail.com%3e

 On Apr 5, 2010, at 4:31 PM, Jasintha Dasanayaka wrote:

  Hey..!
  I  am going to submit proposal for AVRO-457 can you do it AVRO-458 only
 
  On Tue, Apr 6, 2010 at 4:00 AM, Hua Huang h...@sfu.ca wrote:
 
  Hi all,
 
 
 
  This is Hua Huang, a CS master student from Simon Fraser University,
  Canada.
  I am going to participate in the Google Summer of Code 2010 and I also
 find
  out that several projects of AVRO are quite interesting, especially
  AVRO-456(add tools that read/write json records from/to avro data files)
  together with AVRO-457 and AVRO-458.
 
 
 
  I plan to submit a proposal for these projects which would produce a
 C/C++
  command-line tool to support transformation between AVRO data and other
  types of data, like CSV, Json or XML. My key idea is to use parallel bit
  stream technology to speed up the parsing procedure in order to build a
  high
  performance tool which will be very useful in practical, especially in
 the
  large-scale dataset.
 
 
 
  I sent an email to Doug Cutting(cutt...@apache.org) who is the reporter
 of
  these projects, but I haven't received any reply yet. So I am wondering,
 is
  there anybody who can communicate with me for the details of the
 projects,
  or even suggest me a person so that I could contact with him/her for the
  details?
 
 
 
  Any feedback is really appreciated. Thank you very much.
 
 
 
  Yours Sincerely,
 
  Hua
 
 
 
 
 
 
 
 
  --
  Jasintha Dasanayaka
  +94 772 916 596
  +94 472 232 139
  http://www.jasintha.info
  jasint...@gmail.com




-- 
School of Computing / Computing / Year 2
National University of Singapore


Re: Avro-456/457/458 in Google Summer of Code 2010

2010-04-05 Thread Jasintha Dasanayaka
I think 3 projects

On Tue, Apr 6, 2010 at 6:33 AM, Zheng Yang zhengyan...@gmail.com wrote:

 Hi, Scott,

 as you can see, these three tools can be eventually combined into one
 multifunctional tool which accepts different format(csv,xml,json..).
 since I'm also going to submit a proposal for this , may i know are they
 considered to be one project or three?

 Yang

 On Tue, Apr 6, 2010 at 7:43 AM, Scott Carey sc...@richrelevance.com
 wrote:

  FYI, it seems as though at least one other person has chosen a similar
 task
  for GSOC:
 
 
 
 
 http://mail-archives.apache.org/mod_mbox/hadoop-avro-dev/201003.mbox/%3c179519d11003212231y4537eb03i6f89eb3f6f745...@mail.gmail.com%3e
 
  On Apr 5, 2010, at 4:31 PM, Jasintha Dasanayaka wrote:
 
   Hey..!
   I  am going to submit proposal for AVRO-457 can you do it AVRO-458 only
  
   On Tue, Apr 6, 2010 at 4:00 AM, Hua Huang h...@sfu.ca wrote:
  
   Hi all,
  
  
  
   This is Hua Huang, a CS master student from Simon Fraser University,
   Canada.
   I am going to participate in the Google Summer of Code 2010 and I also
  find
   out that several projects of AVRO are quite interesting, especially
   AVRO-456(add tools that read/write json records from/to avro data
 files)
   together with AVRO-457 and AVRO-458.
  
  
  
   I plan to submit a proposal for these projects which would produce a
  C/C++
   command-line tool to support transformation between AVRO data and
 other
   types of data, like CSV, Json or XML. My key idea is to use parallel
 bit
   stream technology to speed up the parsing procedure in order to build
 a
   high
   performance tool which will be very useful in practical, especially in
  the
   large-scale dataset.
  
  
  
   I sent an email to Doug Cutting(cutt...@apache.org) who is the
 reporter
  of
   these projects, but I haven't received any reply yet. So I am
 wondering,
  is
   there anybody who can communicate with me for the details of the
  projects,
   or even suggest me a person so that I could contact with him/her for
 the
   details?
  
  
  
   Any feedback is really appreciated. Thank you very much.
  
  
  
   Yours Sincerely,
  
   Hua
  
  
  
  
  
  
  
  
   --
   Jasintha Dasanayaka
   +94 772 916 596
   +94 472 232 139
   http://www.jasintha.info
   jasint...@gmail.com
 
 


 --
 School of Computing / Computing / Year 2
 National University of Singapore




-- 
Jasintha Dasanayaka
+94 772 916 596
+94 472 232 139
http://www.jasintha.info
jasint...@gmail.com