GSOC Create Sql adapters proposal

2010-04-09 Thread Necati Batur
Hi,

Create adapters for MYSQL and NOSQL(hbase, cassandra) to access data for all
the algorithms to use;

Necati Batur ; necatiba...@gmail.com



Mahout / Mahout - 332 : Assigned Mentor is Robin Anil



Proposal Abstract:

It would be useful to use thrift as the protocol with the noSQL systems, as
opposed to the native API of them so that a nice abstraction could be made
for all the NoSQL systems in general and specific thrift client
implementations added to maximize code re-use. Even if someone were to make
the port for 1 NoSQL client, having the demarcation would help to pick up
and port.

Detailed Description:

The data adapters fort he higher level languages will require the good
capability of using data structures and some information about finite
mathematics that I am confident on that issues.Then,the code given in svn
repository seems to need some improvements and also documetation.

Briefly,I would do the following operations fort his project



   1. Understand the underlying maths for adapters
   2. Determine the data structures that would be used for adapters
   3. Implement the neccassary methods to handle creation of these
   structures
   4. Some test cases that we probably would need to check whether our code
   cover all the issues required by a data retrieve operations
   5. New iterations on the code to robust the algorithms
   6. Documentation of overall project to join our particular Project to
   overall scope

Additional Information:

First of all,I am very excited to join an organization like GSOC and most
importantly work for a big open source Project apache.I am looking for a
good collaboration and new challenges on software development.Especially
information management issues sound great to me.I am confident to work with
all new technologies.I took the data structures I , II courses at university
so I am ok with data structures.Most importantly I am interested in
databases.From my software engineering courses experience I know how to work
on a project by iterative development and timelining


[jira] Commented: (MAHOUT-332) Create adapters for MYSQL and NOSQL(hbase, cassandra) to access data for all the algorithms to use

2010-04-03 Thread Necati Batur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853167#action_12853167
 ] 

Necati Batur commented on MAHOUT-332:
-

I am a senior student at computer engineering at iztech in turkey. My areas of 
inetrests are information management, OOP(Object Oriented Programming) and 
currently bioinformatics. I have been working with a Asistan Professor(Jens 
Allmer) in molecular biology genetics department for one year.Firstly, we 
worked on a protein database 2DB and we presented the project in HIBIT09 
organization. The Project  was Database management system independence by 
amending 2DB with a database access layer (written in Java). Currently, I am 
working on another project (Kerb) as my senior project which is a general 
sqeuential task management system intend to reduce the errors and increase time 
saving in biological experiments. We will present this project in HIBIT2010 too.

I am confident to work with all new technologies.I took the data structures I , 
II courses at university so I am ok with data structures.Most importantly I am 
interested in databases.From my software engineering courses experience I know 
how to work on a project by iterative development and timelining. In order to 
add more functionalities I need a mentor to contact for this project.

 Create adapters for  MYSQL and NOSQL(hbase, cassandra) to access data for all 
 the algorithms to use
 ---

 Key: MAHOUT-332
 URL: https://issues.apache.org/jira/browse/MAHOUT-332
 Project: Mahout
  Issue Type: New Feature
Reporter: Robin Anil

 A student with a good proposal 
 - should be free to work for Mahout in the summer and should be thrilled to 
 work in this area :)
 - should be able to program in Java and be comfortable with datastructures 
 and algorithms
 - must explore SQL and NOSQL implementations, and design a framework with 
 which data from them could be fetched and converted to mahout format or used 
 directly as a matrix transparently
 - should have a plan to make it high performance with ample caching 
 strategies or the ability to use it on a map/reduce job
 - should focus more on getting a working version than to implement all 
 functionalities. So its recommended that you divide features into milestones
 - must have clear deadlines and pace it evenly across the span of 3 months.
 If you can do something extra it counts, but make sure the plan is reasonable 
 within the specified time frame.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAHOUT-332) Create adapters for MYSQL and NOSQL(hbase, cassandra) to access data for all the algorithms to use

2010-04-03 Thread Necati Batur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12853177#action_12853177
 ] 

Necati Batur commented on MAHOUT-332:
-

Well it will not be to hard to understand the conversion of data into vectors 
if there is a source and algorithm already :)
However could you please give me the neccessary links to check out because in 
website there is excess amount of repositories that I hardly understand what in 
where.
Nonetheless,how should I write a proposal if I am asked to write?
thanks 

 Create adapters for  MYSQL and NOSQL(hbase, cassandra) to access data for all 
 the algorithms to use
 ---

 Key: MAHOUT-332
 URL: https://issues.apache.org/jira/browse/MAHOUT-332
 Project: Mahout
  Issue Type: New Feature
Reporter: Robin Anil

 A student with a good proposal 
 - should be free to work for Mahout in the summer and should be thrilled to 
 work in this area :)
 - should be able to program in Java and be comfortable with datastructures 
 and algorithms
 - must explore SQL and NOSQL implementations, and design a framework with 
 which data from them could be fetched and converted to mahout format or used 
 directly as a matrix transparently
 - should have a plan to make it high performance with ample caching 
 strategies or the ability to use it on a map/reduce job
 - should focus more on getting a working version than to implement all 
 functionalities. So its recommended that you divide features into milestones
 - must have clear deadlines and pace it evenly across the span of 3 months.
 If you can do something extra it counts, but make sure the plan is reasonable 
 within the specified time frame.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.