Hi all,

I almost finished my GSoC proposal about the project " a data communicate
tool between json/xml/csv and avro data files".I will describe it for you
and expecting your advises.

Two mainly parts of the tool:

1. Data communication module,i.e. read/write json/xml/csv records from/to
avro data files
There are two steps:

Step one: read/write json/xml/csv records to AVRO datum

For json:
AVRO supplies ParsingDecoder and JsonGenerator already,we can use these two
classes to communicate data between AVRO datum and json data.

For XML:
I must extends the abstract ParsingDecoder,and build XMLDecoder  class to
parse data from XML file,and convert it to AVRO datum. And also,a
XMLGenerator class which is used to change AVRO datum to XML data file is
also necessary. This section need some XML parse jobs,may be Apache Xerces
is a good choice, fortunately, i am familiar with it.

For CSV:
Also,i must build a CSVDecoder to convert CSV data to AVRO datum and a
CSVGenerator class to convert AVRO datum to CSV files. This section need
some operations with CSV data,I think Apache Commons csv can help us.

Step two: read/write AVRO datum to avro data files

AVRO has implemented this function already,so, it will not cost me much time
and energy

2.command-tool interface design

Basic interface design:

The tool is based on Java Swing,it is made up of a command input textarea
and a information output panel which is used to show now status,command
execute result and data output ect.

Command system design:
1).Each command is a class which implement a interface called
BasicCommand,the interface has a execute function. Command implemention
class must implement the concrete operations in the execute function.
2).Use a xml configuration file to register command classes in to the
command system. At the beginning,this tool will have some basic commands(i
will introduce the basic commands soon after),in the future,if we want to
implement more commands for the tool, finish the corresponding command
class,then register it,ok!
3).In the initialization period,the tool will parse command configuration
xml file,instance
command classes,and load them in the context. It will use a ArrayList to
store all the
system commands during running period.
4).when user input a command,the tool traversal command array list,if the
command exist and have correct format argument,execute it (execution
operation is to invoke command instance's execute function). If the command
exist,but the arguments is not match with
declaration,print out usage information about the command.If the tool can
not find the
command,tell user "the command is not an available command".
5).The tool use a xml configuration file to store some system
attributes,such as default
workspace,default work mode(json/xml or csv) and info output fonsize ect.

System initialization commands design:
1).workspace set up command;
2).get history workspace command;
3).work mode change command;
4).list data files command;
5).data output command;
This command works different in different work mode,for example,in json
mode,the data will
output as a json string,but in xml mode,the data will output as a xml file.
User can also assign specific output mode by argument,default output mode is
current working mode.
This command can assign specific output stream,export the data into a data
file or just
output in the tool interface,default output stream is the operation
interface.
6).data input command:
This command is used to input data and change it to AVRO data file. It has
four work
mode,user can assign its work model by command argument:

model 1:input schema data and content data from IO device;
model 2:input schema data from IO devices but input content data from data
file in the local
disk;
model 3:input schema data from data file in the local disk but input content
data from IO
devices;
model 4:input schema data and content data from data files in the local
disk.
Default work mode is mode 1,when user input this command,press enter,a
Graphic Swing Panel show up,user can finish its input job in this panel. Of
course,different command mode will bring different Swing Input Panel,four in
all.
7).system basic set up command,this may include set up font,fontsize,color
ect.

This is my mainly ideas,any one have advises or suggestions,please let me
know,thank you :-)

Peng
On Mon, Mar 22, 2010 at 1:31 PM, Peng Cui <ajiu....@gmail.com> wrote:

> Hi Doug,
>
> My name is Cui Peng. I want to implement the data communicate tool between
> json/xml/csv and avro data files as you described in the GSoC 2010 idea
> list. I exported AVRO source code,research its design and architect,then i
> got mainly idea about the tool, then i will show it to you,and expecting
> your advises :-)
>
> I think there are mainly two parts of jobs to do:
>
> 1. Read/write json/xml/csv records from/to avro data files
> There are two steps:
>
> Step one: read/write json/xml/csv records to AVRO datum
>
> For json:
> AVRO supplies ParsingDecoder and JsonGenerator already,we can use these two
> classes to communicate data between AVRO datum and json data.
> For XML:
> I must extends the abstract ParsingDecoder,and build XMLDecoder  class to
> parse data from XML file,and convert it to AVRO datum. And also,a
> XMLGenerator class which is used to change AVRO datum to XML data file is
> also necessary. This section need some XML parse jobs,may be Apache Xerces
> is a good choice, fortunately, i am familiar with it.
> For CSV:
> Also,i must build a CSVDecoder to convert CSV data to AVRO datum and a
> CSVGenerator class to convert AVRO datum to CSV files. This section need
> some operations with CSV data,I think Apache Commons csv can help us.
>
> Step two: read/write AVRO datum to avro data files
> AVRO has implemented this function already,so, i will not cost me much time
> and energy
>
> 2. A Swing based command-line tool,this tool will help us to execute some
> commands, collect data from user input etc.
> Step one give us data communicate support between json/xml/csv data files
> and avro data files,then,we should build the command-line tool and design
> its command system.
>
> 1).this tool will have three mode,json,xml or csv model,can use special
> command to  swith working model
> 2).this tool will support two data input model,from keyborad or from exist
> data file
> 3).its command adopts command and argument form,for example,"input -f"
> means import data from existing data files,"input -k" means give user
> a graphics data input area,user can input data though keyboard
> 4).data output format function
> 5).if  exception occurs, it will show in the tool
>
>
> That is all,if you have any ideas,please let me know. Thank you and best
> regards
>

Reply via email to