[ 
https://issues.apache.org/jira/browse/CONNECTORS-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14375193#comment-14375193
 ] 

Tugba Dogan commented on CONNECTORS-1162:
-----------------------------------------

Hi,
I am Tugba Dogan. I am currently undergraduate student in Bilkent University.  
I am really interested working in this project for GSoC 2015. I’ll graduate in 
1st of June 2015 and I will not have other commitment during the summer other 
than GSoC project. So, I think I can work 7-8 hours per day in weekdays. This 
will be my first GSoC experience. 
I want to work on Big Data industry after graduation and I think this project 
will help me to be involved in that area.  I would like to discuss details 
about this project and get feedback for my proposal from you.

I have installed a ManifoldCF instance to my server and started to using it. I 
can also install single and distributed Kafka cluster and I can test its 
integration during the development. I have some knowledge about Kafka too.
I think we might also implement repository connector for Kafka because I think 
that it might be very useful transferring data to other output connectors Solr, 
Elasticsearch, HDFS etc from Kafka repository.

Because of the fact that Kafka does not provide any ACL features for now, we 
won't need authority connector for Kafka at this time. They are planning to 
implement these features in future releases, we might add that feature to 
ManifoldCF later.

Here is my planned deliverables for this project:
Output Connectors for Kafka 0.8.x and 0.1-0.7.x
Unit & Integration tests for output connector
Repository Connectors for Kafka 0.8.x and 0.1-0.7.x
Unit & Integration tests for repository connector

I guess Kafka 0.8.x is not backward compatible with old versions. Do you think 
that we should implement connectors for old versions ?

Thanks

Proposal Draft: 
https://docs.google.com/document/d/1KDsWgTwMhpPqx6SPKiYb8bQwKiOSoFrIzcX8wrl91C0/edit?usp=sharing

> Apache Kafka Output Connector
> -----------------------------
>
>                 Key: CONNECTORS-1162
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1162
>             Project: ManifoldCF
>          Issue Type: Wish
>    Affects Versions: ManifoldCF 1.8.1, ManifoldCF 2.0.1
>            Reporter: Rafa Haro
>            Assignee: Rafa Haro
>              Labels: gsoc, gsoc2015
>             Fix For: ManifoldCF 1.9, ManifoldCF 2.1
>
>
> Kafka is a distributed, partitioned, replicated commit log service. It 
> provides the functionality of a messaging system, but with a unique design. A 
> single Kafka broker can handle hundreds of megabytes of reads and writes per 
> second from thousands of clients.
> Apache Kafka is being used for a number of uses cases. One of them is to use 
> Kafka as a feeding system for streaming BigData processes, both in Apache 
> Spark or Hadoop environment. A Kafka output connector could be used for 
> streaming or dispatching crawled documents or metadata and put them in a 
> BigData processing pipeline



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to