Re: Avro-456/457/458 in Google Summer of Code 2010
I have very little knowledge about GSOC and do not know what is considered a project there. Avro doesn't define 'projects'. Its JIRA tickets can be items that take an hour to do or a month. Some of the tasks related to conversion of csv, xml, or json data into or out of avro are very simple, and some are not. The scope of all three, including a unified tool for conversion and testing, is a fairly large overall project. JSON would be the easiest, since Avro already supports serialization to and from it. XML has the potential to be the most complete and flexible. It is a lot of work but would use standard APIs and tools. CSV has the most restrictions -- it can't support recursive schemas or unions well. Additionally it can have some tricky corner cases since it is not a strict standard. It will likely be the most popular 'import into avro' variation, however. The tools for conversion (command line or other) and the testing required for all of the above are non-trivial, mostly because such a tool has to be very good at clear error handling and reporting or it won't be very useful in the real world. -Scott On Apr 5, 2010, at 6:03 PM, Zheng Yang wrote: > Hi, Scott, > > as you can see, these three tools can be eventually combined into one > multifunctional tool which accepts different format(csv,xml,json..). > since I'm also going to submit a proposal for this , may i know are they > considered to be one project or three? > > Yang > > On Tue, Apr 6, 2010 at 7:43 AM, Scott Carey wrote: > >> FYI, it seems as though at least one other person has chosen a similar task >> for GSOC: >> >> >> >> http://mail-archives.apache.org/mod_mbox/hadoop-avro-dev/201003.mbox/%3c179519d11003212231y4537eb03i6f89eb3f6f745...@mail.gmail.com%3e >> >> On Apr 5, 2010, at 4:31 PM, Jasintha Dasanayaka wrote: >> >>> Hey..! >>> I am going to submit proposal for AVRO-457 can you do it AVRO-458 only >>> >>> On Tue, Apr 6, 2010 at 4:00 AM, Hua Huang wrote: >>> Hi all, This is Hua Huang, a CS master student from Simon Fraser University, Canada. I am going to participate in the Google Summer of Code 2010 and I also >> find out that several projects of AVRO are quite interesting, especially AVRO-456(add tools that read/write json records from/to avro data files) together with AVRO-457 and AVRO-458. I plan to submit a proposal for these projects which would produce a >> C/C++ command-line tool to support transformation between AVRO data and other types of data, like CSV, Json or XML. My key idea is to use parallel bit stream technology to speed up the parsing procedure in order to build a high performance tool which will be very useful in practical, especially in >> the large-scale dataset. I sent an email to Doug Cutting(cutt...@apache.org) who is the reporter >> of these projects, but I haven't received any reply yet. So I am wondering, >> is there anybody who can communicate with me for the details of the >> projects, or even suggest me a person so that I could contact with him/her for the details? Any feedback is really appreciated. Thank you very much. Yours Sincerely, Hua >>> >>> >>> -- >>> Jasintha Dasanayaka >>> +94 772 916 596 >>> +94 472 232 139 >>> http://www.jasintha.info >>> jasint...@gmail.com >> >> > > > -- > School of Computing / Computing / Year 2 > National University of Singapore
Re: Avro-456/457/458 in Google Summer of Code 2010
I think 3 projects On Tue, Apr 6, 2010 at 6:33 AM, Zheng Yang wrote: > Hi, Scott, > > as you can see, these three tools can be eventually combined into one > multifunctional tool which accepts different format(csv,xml,json..). > since I'm also going to submit a proposal for this , may i know are they > considered to be one project or three? > > Yang > > On Tue, Apr 6, 2010 at 7:43 AM, Scott Carey > wrote: > > > FYI, it seems as though at least one other person has chosen a similar > task > > for GSOC: > > > > > > > > > http://mail-archives.apache.org/mod_mbox/hadoop-avro-dev/201003.mbox/%3c179519d11003212231y4537eb03i6f89eb3f6f745...@mail.gmail.com%3e > > > > On Apr 5, 2010, at 4:31 PM, Jasintha Dasanayaka wrote: > > > > > Hey..! > > > I am going to submit proposal for AVRO-457 can you do it AVRO-458 only > > > > > > On Tue, Apr 6, 2010 at 4:00 AM, Hua Huang wrote: > > > > > >> Hi all, > > >> > > >> > > >> > > >> This is Hua Huang, a CS master student from Simon Fraser University, > > >> Canada. > > >> I am going to participate in the Google Summer of Code 2010 and I also > > find > > >> out that several projects of AVRO are quite interesting, especially > > >> AVRO-456(add tools that read/write json records from/to avro data > files) > > >> together with AVRO-457 and AVRO-458. > > >> > > >> > > >> > > >> I plan to submit a proposal for these projects which would produce a > > C/C++ > > >> command-line tool to support transformation between AVRO data and > other > > >> types of data, like CSV, Json or XML. My key idea is to use parallel > bit > > >> stream technology to speed up the parsing procedure in order to build > a > > >> high > > >> performance tool which will be very useful in practical, especially in > > the > > >> large-scale dataset. > > >> > > >> > > >> > > >> I sent an email to Doug Cutting(cutt...@apache.org) who is the > reporter > > of > > >> these projects, but I haven't received any reply yet. So I am > wondering, > > is > > >> there anybody who can communicate with me for the details of the > > projects, > > >> or even suggest me a person so that I could contact with him/her for > the > > >> details? > > >> > > >> > > >> > > >> Any feedback is really appreciated. Thank you very much. > > >> > > >> > > >> > > >> Yours Sincerely, > > >> > > >> Hua > > >> > > >> > > >> > > >> > > >> > > >> > > > > > > > > > -- > > > Jasintha Dasanayaka > > > +94 772 916 596 > > > +94 472 232 139 > > > http://www.jasintha.info > > > jasint...@gmail.com > > > > > > > -- > School of Computing / Computing / Year 2 > National University of Singapore > -- Jasintha Dasanayaka +94 772 916 596 +94 472 232 139 http://www.jasintha.info jasint...@gmail.com
Re: Avro-456/457/458 in Google Summer of Code 2010
Hi, Scott, as you can see, these three tools can be eventually combined into one multifunctional tool which accepts different format(csv,xml,json..). since I'm also going to submit a proposal for this , may i know are they considered to be one project or three? Yang On Tue, Apr 6, 2010 at 7:43 AM, Scott Carey wrote: > FYI, it seems as though at least one other person has chosen a similar task > for GSOC: > > > > http://mail-archives.apache.org/mod_mbox/hadoop-avro-dev/201003.mbox/%3c179519d11003212231y4537eb03i6f89eb3f6f745...@mail.gmail.com%3e > > On Apr 5, 2010, at 4:31 PM, Jasintha Dasanayaka wrote: > > > Hey..! > > I am going to submit proposal for AVRO-457 can you do it AVRO-458 only > > > > On Tue, Apr 6, 2010 at 4:00 AM, Hua Huang wrote: > > > >> Hi all, > >> > >> > >> > >> This is Hua Huang, a CS master student from Simon Fraser University, > >> Canada. > >> I am going to participate in the Google Summer of Code 2010 and I also > find > >> out that several projects of AVRO are quite interesting, especially > >> AVRO-456(add tools that read/write json records from/to avro data files) > >> together with AVRO-457 and AVRO-458. > >> > >> > >> > >> I plan to submit a proposal for these projects which would produce a > C/C++ > >> command-line tool to support transformation between AVRO data and other > >> types of data, like CSV, Json or XML. My key idea is to use parallel bit > >> stream technology to speed up the parsing procedure in order to build a > >> high > >> performance tool which will be very useful in practical, especially in > the > >> large-scale dataset. > >> > >> > >> > >> I sent an email to Doug Cutting(cutt...@apache.org) who is the reporter > of > >> these projects, but I haven't received any reply yet. So I am wondering, > is > >> there anybody who can communicate with me for the details of the > projects, > >> or even suggest me a person so that I could contact with him/her for the > >> details? > >> > >> > >> > >> Any feedback is really appreciated. Thank you very much. > >> > >> > >> > >> Yours Sincerely, > >> > >> Hua > >> > >> > >> > >> > >> > >> > > > > > > -- > > Jasintha Dasanayaka > > +94 772 916 596 > > +94 472 232 139 > > http://www.jasintha.info > > jasint...@gmail.com > > -- School of Computing / Computing / Year 2 National University of Singapore
Re: Avro-456/457/458 in Google Summer of Code 2010
FYI, it seems as though at least one other person has chosen a similar task for GSOC: http://mail-archives.apache.org/mod_mbox/hadoop-avro-dev/201003.mbox/%3c179519d11003212231y4537eb03i6f89eb3f6f745...@mail.gmail.com%3e On Apr 5, 2010, at 4:31 PM, Jasintha Dasanayaka wrote: > Hey..! > I am going to submit proposal for AVRO-457 can you do it AVRO-458 only > > On Tue, Apr 6, 2010 at 4:00 AM, Hua Huang wrote: > >> Hi all, >> >> >> >> This is Hua Huang, a CS master student from Simon Fraser University, >> Canada. >> I am going to participate in the Google Summer of Code 2010 and I also find >> out that several projects of AVRO are quite interesting, especially >> AVRO-456(add tools that read/write json records from/to avro data files) >> together with AVRO-457 and AVRO-458. >> >> >> >> I plan to submit a proposal for these projects which would produce a C/C++ >> command-line tool to support transformation between AVRO data and other >> types of data, like CSV, Json or XML. My key idea is to use parallel bit >> stream technology to speed up the parsing procedure in order to build a >> high >> performance tool which will be very useful in practical, especially in the >> large-scale dataset. >> >> >> >> I sent an email to Doug Cutting(cutt...@apache.org) who is the reporter of >> these projects, but I haven't received any reply yet. So I am wondering, is >> there anybody who can communicate with me for the details of the projects, >> or even suggest me a person so that I could contact with him/her for the >> details? >> >> >> >> Any feedback is really appreciated. Thank you very much. >> >> >> >> Yours Sincerely, >> >> Hua >> >> >> >> >> >> > > > -- > Jasintha Dasanayaka > +94 772 916 596 > +94 472 232 139 > http://www.jasintha.info > jasint...@gmail.com
Re: Avro-456/457/458 in Google Summer of Code 2010
Hey..! I am going to submit proposal for AVRO-457 can you do it AVRO-458 only On Tue, Apr 6, 2010 at 4:00 AM, Hua Huang wrote: > Hi all, > > > > This is Hua Huang, a CS master student from Simon Fraser University, > Canada. > I am going to participate in the Google Summer of Code 2010 and I also find > out that several projects of AVRO are quite interesting, especially > AVRO-456(add tools that read/write json records from/to avro data files) > together with AVRO-457 and AVRO-458. > > > > I plan to submit a proposal for these projects which would produce a C/C++ > command-line tool to support transformation between AVRO data and other > types of data, like CSV, Json or XML. My key idea is to use parallel bit > stream technology to speed up the parsing procedure in order to build a > high > performance tool which will be very useful in practical, especially in the > large-scale dataset. > > > > I sent an email to Doug Cutting(cutt...@apache.org) who is the reporter of > these projects, but I haven't received any reply yet. So I am wondering, is > there anybody who can communicate with me for the details of the projects, > or even suggest me a person so that I could contact with him/her for the > details? > > > > Any feedback is really appreciated. Thank you very much. > > > > Yours Sincerely, > > Hua > > > > > > -- Jasintha Dasanayaka +94 772 916 596 +94 472 232 139 http://www.jasintha.info jasint...@gmail.com