Vishal Karande created SAMOA-40:
-----------------------------------

             Summary: Add Kafka stream reader modules to consume data from 
Kafka framework
                 Key: SAMOA-40
                 URL: https://issues.apache.org/jira/browse/SAMOA-40
             Project: SAMOA
          Issue Type: Task
          Components: Infrastructure, SAMOA-API
         Environment: OS X Version 10.10.3
            Reporter: Vishal Karande
            Priority: Minor


Apache SAMOA is designed to process streaming data and develop streaming 
machine learning
algorithm. Currently, SAMOA framework supports stream data read from Arff files 
only.
Thus, while using SAMOA as a streaming machine learning component in real time 
use-cases,
writing and reading data from files is slow and inefficient.

A single Kafka broker can handle hundreds of megabytes of reads and writes per 
second 
from thousands of clients. The ability to read data directly from Apache Kafka 
into SAMOA will 
not only improve performance but also make SAMOA pluggable to many real time 
machine
learning use cases such as Internet of Things(IoT).

GOAL:
Add code that enables SAMOA to read data from Apache Kafka as a stream data.
Kafka stream reader supports following different options for streaming:

a) Topic selection - Kafka topic to read data
b) Partition selection - Kafka partition to read data
c) Batching - Number of data instances read from Kafka in one read request to 
Kafka
d) Configuration options - Kafka port number, seed information, time delay 
between two read requests

Components:
KafkaReader - Consists for APIs to read data from Kafka
KafkaStream - Stream source for SAMOA providing data read from Kafka
Dependencies for Kafka are added in pom.xml for in samoa-api component. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to