Unsubscribe
Thanks TD.
On Mon, Aug 7, 2017 at 8:59 PM, Tathagata Das
wrote:
> I dont think there is any easier way.
>
> On Mon, Aug 7, 2017 at 7:32 PM, shyla deshpande
> wrote:
>
>> Thanks TD for the response. I forgot to mention that I am not using
I dont think there is any easier way.
On Mon, Aug 7, 2017 at 7:32 PM, shyla deshpande
wrote:
> Thanks TD for the response. I forgot to mention that I am not using
> structured streaming.
>
> I was looking into KafkaUtils.createRDD, and looks like I need to get the
>
Thanks TD for the response. I forgot to mention that I am not using
structured streaming.
I was looking into KafkaUtils.createRDD, and looks like I need to get the
earliest and the latest offset for each partition to build the
Array(offsetRange). I wanted to know if there was a easier way.
1
Its best to use DataFrames. You can read from as streaming or as batch.
More details here.
https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#creating-a-kafka-source-for-batch-queries
Hi all,
What is the easiest way to read all the data from kafka in a batch program
for a given topic?
I have 10 kafka partitions, but the data is not much. I would like to read
from the earliest from all the partitions for a topic.
I appreciate any help. Thanks
I think there is really no good reason for this limitation.
On Mon, Aug 7, 2017 at 2:58 AM, Jacek Laskowski wrote:
> Hi,
>
> While exploring checkpointing with kafka source and console sink I've
> got the exception:
>
> // today's build from the master
> scala> spark.version
>
hi folks,
I was debugging a Spark job that ending up with too few partitions during
the join step and thought I'd reach out understand if this is the right
behavior / what typical workarounds are.
I have two RDDs that I'm joining. One with a lot of partitions (5K+) and
one with much lesser
I am doing that already for all known messy data. Thanks Cody for all your
time and input
On Mon, Aug 7, 2017 at 11:58 AM, Cody Koeninger wrote:
> Yes
>
> On Mon, Aug 7, 2017 at 12:32 PM, shyla deshpande
> wrote:
> > Thanks Cody again.
> >
> > No.
Spark sample submitted with cluster deploy-mode (./bin/run-example --master
spark://localhost:6066 --deploy-mode=cluster SparkPi 10) throw out following
error, any one knows what the problem is?
==
17/08/07 16:27:39 ERROR RestSubmissionClient: Exception from the cluster:
Yes
On Mon, Aug 7, 2017 at 12:32 PM, shyla deshpande
wrote:
> Thanks Cody again.
>
> No. I am doing mapping of the Kafka ConsumerRecord to be able to save it in
> the Cassandra table and saveToCassandra is an action and my data do get
> saved into Cassandra. It is
Thanks a lot for the quick pointer!
So, is the advice I linked to in official Spark 2.2 documentation
misleading? You are saying that Spark 2.2 does not use by Java
serialization? And the tip to switch to Kyro is also outdated?
Ofir Manor
Co-Founder & CTO | Equalum
Mobile: +972-54-7801286 |
For Dataframe (and Dataset), cache() already uses fast
serialization/deserialization with data compression schemes.
We already identified some performance issues regarding cache(). We are
working for alleviating these issues in
https://issues.apache.org/jira/browse/SPARK-14098.
We expect that
Thanks Cody again.
No. I am doing mapping of the Kafka ConsumerRecord to be able to save it in
the Cassandra table and saveToCassandra is an action and my data do get
saved into Cassandra. It is working as expected 99% of the time except that
when there is an exception, I did not want the
Hi,
I'm using Spark 2.2, and have a big batch job, using dataframes (with
built-in, basic types). It references the same intermediate dataframe
multiple times, so I wanted to try to cache() that and see if it helps,
both in memory footprint and performance.
Now, the Spark 2.2 tuning page (
If literally all you are doing is rdd.map I wouldn't expect
saveToCassandra to happen at all, since map is not an action.
Filtering for unsuccessful attempts and collecting those back to the
driver would be one way for the driver to know whether it was safe to
commit.
On Mon, Aug 7, 2017 at
(Also asked on SO at https://stackoverflow.com/q/45543140/416300)
I am trying to migrate table definitions from one Hive metastore to another.
The source cluster has:
- Spark 1.6.0
- Hive 1.1.0 (cdh)
- HDFS
The destination cluster is an EMR cluster with:
- Spark 2.1.1
- Hive
Hi,
While exploring checkpointing with kafka source and console sink I've
got the exception:
// today's build from the master
scala> spark.version
res8: String = 2.3.0-SNAPSHOT
scala> val q = records.
| writeStream.
| format("console").
| option("truncate", false).
|
Unsubscribe
19 matches
Mail list logo