[ https://issues.apache.org/jira/browse/SQOOP-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088664#comment-14088664 ]
Gwen Shapira commented on SQOOP-1414: ------------------------------------- The planned syntax will be: sqoop import --connection kafka:broker://broker_host:broker_port --table-name topic I currently plan to implement: First phase: - No HBase, no Accumulo (A streaming solution makes more sense there) - Assuming data in Kafka is String - Single broker in connect string - Exactly once semantics (using SimpleConsumer, checkpointing reads to HDFS) - Limited to a single topic per Sqoop job - Mapper per partition (no user control on number of mappers) TBD later (possibly only on Sqoop2): - Avro / Paruqet (probably via Kite) - Hive / HCat integration - Pluggable Decoder - Specify number of mappers - List of brokers - List of topics > Add support for Import from Kafka > ---------------------------------- > > Key: SQOOP-1414 > URL: https://issues.apache.org/jira/browse/SQOOP-1414 > Project: Sqoop > Issue Type: Improvement > Affects Versions: 1.4.4 > Reporter: Gwen Shapira > Assignee: Gwen Shapira > > Kafka is an important data source for many organizations. > Support in Sqoop will allow users to easily run MapReduce jobs to read data > from Kafka topics to HDFS in various formats and to integrate with Hive. -- This message was sent by Atlassian JIRA (v6.2#6252)