Hi Kevin.
When you say Kafka interacting with Oracle database (if I understand you
correctly) are you using GoldenGate with Kafka interface to push data from
Oracle to Kafka?
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> Does parquet file has limit in size ( 1TB ) ?
I did’t see any problem but 1TB is too big to operation need to divide into
small pieces.
> Should we use SaveMode.APPEND for long running streaming app ?
Yes, but you need to partition it by time so it easy to maintain like update or
delete a
Hi Mich,
My stack is as following:
Data sources:
* IBM MQ
* Oracle database
Kafka to store all messages from data sources
Spark Streaming fetching messages from Kafka and do a bit transform and
write parquet files to HDFS
Hive / SparkSQL / Impala will query on parquet files.
Do you have any
Hi,
Can you explain about you particular stack.
Example what is the source of streaming data and the role that Spark plays.
Are you dealing with Real Time and Batch and why Parquet and not something
like Hbase to ingest data real time.
HTH
Dr Mich Talebzadeh
LinkedIn *
Hi,
Does anyone know what is the best practises to store data to parquet file?
Does parquet file has limit in size ( 1TB ) ?
Should we use SaveMode.APPEND for long running streaming app ?
How should we store in HDFS (directory structure, ... )?
Thanks,
Kevin.