Vishal Karande created SAMOA-40:
-----------------------------------
Summary: Add Kafka stream reader modules to consume data from
Kafka framework
Key: SAMOA-40
URL: https://issues.apache.org/jira/browse/SAMOA-40
Project: SAMOA
Issue Type: Task
Components: Infrastructure, SAMOA-API
Environment: OS X Version 10.10.3
Reporter: Vishal Karande
Priority: Minor
Apache SAMOA is designed to process streaming data and develop streaming
machine learning
algorithm. Currently, SAMOA framework supports stream data read from Arff files
only.
Thus, while using SAMOA as a streaming machine learning component in real time
use-cases,
writing and reading data from files is slow and inefficient.
A single Kafka broker can handle hundreds of megabytes of reads and writes per
second
from thousands of clients. The ability to read data directly from Apache Kafka
into SAMOA will
not only improve performance but also make SAMOA pluggable to many real time
machine
learning use cases such as Internet of Things(IoT).
GOAL:
Add code that enables SAMOA to read data from Apache Kafka as a stream data.
Kafka stream reader supports following different options for streaming:
a) Topic selection - Kafka topic to read data
b) Partition selection - Kafka partition to read data
c) Batching - Number of data instances read from Kafka in one read request to
Kafka
d) Configuration options - Kafka port number, seed information, time delay
between two read requests
Components:
KafkaReader - Consists for APIs to read data from Kafka
KafkaStream - Stream source for SAMOA providing data read from Kafka
Dependencies for Kafka are added in pom.xml for in samoa-api component.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)