[ 
https://issues.apache.org/jira/browse/SAMOA-65?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055484#comment-16055484
 ] 

ASF GitHub Bot commented on SAMOA-65:
-------------------------------------

Github user pwawrzyniak commented on the issue:

    https://github.com/apache/incubator-samoa/pull/64
  
    Thank you nicolas-kourtellis for your feedback!
    
    Regarding your comments, I removed copyright notices from the code, fixed 
and clened-up the code according to your comments.
    
    Regarding AVRO code provided within SAMOA-65, the first difference between 
avro loader and our kafka avro mapper is that avro loader works with json files 
containing schema header and payload data (json or binary). In our case avro 
mapper works with byte stream received from kafka and the schema is defined in 
separate file. The other difference is that avro loader produces Instance 
object while Kafka Mapper was designed to work with InstanceContentEvent. 
Moreover kafka avro mapper serializes whole InstanceContentEvent object while 
avro loader reads data file, creates avro structure and, based on that, builds 
Instance object.It should be possible to use the concept from avro loader for 
processing kafka but I suppose it would require implementation of new generic 
loader, instead of using the old one. Other thing is that we need to have two 
way serialization and avro loader is used only for reading data, not writing.
    
    And regarding JSON parser, as of now it is prepared to parse messages 
coming from Apache Kafka in "one-by-one" style, serializing 
InstanceContentEvent class. It could be potenatially used to parse file (in 
line-by-line manner for example), but I believe currently, as the mapper 
accepts byte array as the input, it can be easily used to this task (i.e. as 
the parser when reading data from text/json file).


> Apache Kafka integration components for SAMOA
> ---------------------------------------------
>
>                 Key: SAMOA-65
>                 URL: https://issues.apache.org/jira/browse/SAMOA-65
>             Project: SAMOA
>          Issue Type: New Feature
>          Components: SAMOA-API, SAMOA-Instances
>            Reporter: Piotr Wawrzyniak
>              Labels: kafka, sink, source, streaming
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> As of now Apache SAMOA includes no integration components for Apache Kafka, 
> meaning in particular no possibility to read data coming from Kafka and write 
> data with prediction results back to Kafka.
> The key assumptions for the development of Kafka-related components are as 
> follows:
> 1)    develop support for input data stream arriving to Apache Samoa via 
> Apache Kafka
> 2)    develop support for output data stream produced by Apache Samoa, 
> including the results of stream mining and forwarded to Apache Kafka to be 
> provided in this way to other modules consuming the stream.
> This makes the goal of this issue is to create the following components:
> 1)    KafkaEntranceProcessor in samoa-api. This entrance processor will be 
> able to accept incoming Kafka stream. It will require KafkaDeserializer 
> interface implementation to be delivered. The role of Deserializer would be 
> to translate incoming Apache Kafka messages into implementation of Instance 
> interface of SAMOA.
> 2)    KafkaDestinationProcessor in samoa-api. Similarly to the 
> KafkaEntranceProcessor, this processor would require KafkaSerializer 
> interface implementation to be delivered. The role of Serializer would be to 
> create a Kafka message from the underlying Instance class.
> 3)    KafkaStream, as the extension to existing streams (e.g. 
> InstanceStream), would take similar role to other streams, and will provide 
> the control over Instances flows in the entire topology.
> Moreover, the following assumptions are considered:
> 1)    Components would be implemented with the use of most up-to-date version 
> of Apache Kafka, i.e. 0.10
> 2)    Samples of aforementioned Serializer and Deserializer would be 
> delivered, both supporting AVRO and JSON serialization of Instance objects.
> 3)    Sample testing classes providing reference use of Kafka source and 
> destination would be included in the project as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to