refer
to http://kafka.apache.org/081/documentation.html#kafkahadoopconsumerapi
2015-05-06 12:22 GMT+08:00 MrAsanjar . afsan...@gmail.com:
why not try https://github.com/linkedin/camus - camus is kafka to HDFS
pipeline
On Tue, May 5, 2015 at 11:13 PM, Rendy Bambang Junior
rendy.b.jun
Hi all,
I am planning to load data from Kafka to HDFS. Is it normal to use spark
streaming to load data from Kafka to HDFS? What are concerns on doing this?
There are no processing to be done by Spark, only to store data to HDFS
from Kafka for storage and for further Spark processing
Rendy
at the join section in the streaming programming
guide?
http://spark.apache.org/docs/latest/streaming-programming-guide.html#stream-dataset-joins
On Wed, Apr 29, 2015 at 7:11 AM, Rendy Bambang Junior
rendy.b.jun...@gmail.com wrote:
Let say I have transaction data and visit data
visit
| userId
Let say I am storing my data in HDFS with folder structure and file
partitioning as per below:
/analytics/2015/05/02/partition-2015-05-02-13-50-
Note that new file is created every 5 minutes.
As per my understanding, storing 5minutes file means we could not create
RDD more granular than
Let say I have transaction data and visit data
visit
| userId | Visit source | Timestamp |
| A | google ads | 1 |
| A | facebook ads | 2 |
transaction
| userId | total price | timestamp |
| A | 100 | 248384|
| B | 200 | 43298739 |
I