I made a JIRA for it - https://issues.apache.org/jira/browse/SPARK-23539
Unfortunately it is blocked by Kafka version upgrade, which has a few nasty
issues related to Kafka bugs -
https://issues.apache.org/jira/browse/SPARK-18057
On Wed, Feb 28, 2018 at 3:17 PM, karthikus
Hi all,
I have the following code to stream Kafka data and apply a schema called
"hbSchema" on it and then act on the data.
val df = spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "10.102.255.241:9092")
.option("subscribe", "mabr_avro_json1")
Hi guys,
I have a RDD of thrift struct. I want to convert it into a dataframe. Can
someone suggest how I can do this?
Thanks
Nikhil
TD,
Thanks for your response.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
There is no good way to save to parquet without causing downstream
consistency issues.
You could use foreachRDD to get each RDD, convert it to DataFrame/Dataset,
and write out as parquet files. But you will later run into issues with
partial files caused by failures, etc.
On Wed, Feb 28, 2018 at
Yeah, without actually seeing what's happening on that line, it'd be
difficult to say for sure.
You can check what patches HortonWorks applied, or/and ask them.
And yeah, seg fault is totally possible on any size of the data. But you
should've seen it in the `stdout` (assuming that the regular
I don’t think sql context is “deprecated” in this sense. It’s still accessible
by earlier versions of Spark.
But yes, at first glance it looks like you are correct. I don’t see a
recordWriter method for parquet outside of the SQL package.
Hi Vadim thanks I use HortonWorks package. I dont think there are any seg
faults are dataframe I am trying to write is very small in size. Can it
still create seg fault?
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
Who's your spark provider? EMR, Azure, Databricks, etc.? Maybe contact
them, since they've probably applied some patches
Also have you checked `stdout` for some Segfaults? I vaguely remember
getting `Task failed while writing rows at` and seeing some segfaults that
caused that
On Wed, Feb 28,
Hi all,
I have a Kafka stream data and I need to save the data in parquet format
without using Structured Streaming (due to the lack of Kafka Message header
support).
val kafkaStream =
KafkaUtils.createDirectStream(
streamingContext,
Hi thanks Vadim you are right I saw that line already 468 I dont see any code
it is just comment yes I am sure I am using all spark-* jar which is built
for spark 2.2.0 and Scala 2.11. I am also stuck unfortunately with these
errors not sure how to solve them.
--
Sent from:
I'm sorry, didn't see `Caused by:
java.lang.NullPointerException at
org.apache.spark.sql.SparkSession$$anonfun$3.apply(SparkSession.scala:468)`
Are you sure that you use 2.2.0?
I don't see any code on that line
Hi thanks for the reply I only see NPE and Task failed while writing rows all
over places I dont see any other errors expect SparkException job aborted
and followed by two exception I pasted earlier.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
There should be another exception trace (basically, the actual cause) after
this one, could you post it?
On Wed, Feb 28, 2018 at 1:39 PM, unk1102 wrote:
> Hi I am getting the following exception when I try to write DataFrame using
> the following code. Please guide. I am
Hi I am getting the following exception when I try to write DataFrame using
the following code. Please guide. I am using Spark 2.2.0.
df.write.format("parquet").mode(SaveMode.Append);
org.apache.spark.SparkException: Task failed while writing rows at
Hi,
Is there any best approach to reduce shuffling in spark. I have two tables
and both of them are large. any suggestions? I saw only about broadcast but
that will not work in my case.
Thanks,
Asmath
Anyone know a way right now to do this? As best as I can tell I need a
custom expression to pass to udf to do this.
Just finished a protobuf encoder and it feels like expression is not meant
to be public (good amount of things are private[sql]), am I wrong about
this? Am I looking at the right
17 matches
Mail list logo