unsubscribe
2021년 5월 26일 (수) 오전 12:31, Eric Beabes 님이 작성:
> I keep getting the following exception when I am trying to read a Parquet
> file from a Path on S3 in Spark/Scala. Note: I am running this on EMR.
>
> java.lang.NullPointerException
> at
>
Hi,
I know it, but my purpose it to transforming json string in DataSet
to Dataset, while spark.readStream can only support read json file in
specified path.
https://stackoverflow.com/questions/48617474/how-to-convert-json-dataset-to-dataframe-in-spark-structured-streaming
gives an essential
Have you read through the documentation of Structured Streaming?
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html
One of the basic mistakes you are making is defining the dataset as with
`spark.read()`. You define a streaming Dataset as `spark.readStream()`
On
Hi, Tathagata
I have tried structured streaming, but in line
> Dataset rowDataset = spark.read().json(jsondataset);
Always throw
> Queries with streaming sources must be executed with writeStream.start()
But what i need to do in this step is only transforming json string data to
Dataset .
It's not very surprising that doing this sort of RDD to DF conversion
inside DStream.foreachRDD has weird corner cases like this. In fact, you
are going to have additional problems with partial parquet files (when
there are failures) in this approach. I strongly suggest that you use
Structured
getAs defined as:
def getAs[T](i: Int): T = get(i).asInstanceOf[T]
and when you do toString you call Object.toString which doesn't depend on
the type,
so asInstanceOf[T] get dropped by the compiler, i.e.
row.getAs[Int](0).toString -> row.get(0).toString
we can confirm that by writing a simple
Hi
The question is getting to the list.
I have no experience in hbase ...though , having seen similar stuff when
saving a df somewhere else...it might have to do with the properties you
need to set to let spark know it is dealing with hbase? Don't u need to set
some properties on the spark
Hi Guys- am not sure whether the email is reaching to the community members.
Please can somebody acknowledge
Sent from my iPhone
> On 30-Sep-2017, at 5:02 PM, Debabrata Ghosh wrote:
>
> Dear All,
>Greetings ! I am repeatedly hitting a
I was able to resolve the serialization issue. The root cause was, I was
accessing the config values within foreachRDD{}.
The solution was to extract the values from config outside the foreachRDD
scope and send in values to the loop directly. Probably something obvious
as we cannot have nested
That looks like a classpath problem. You should not have to include
the kafka_2.10 artifact in your pom, spark-streaming-kafka_2.10
already has a transitive dependency on it. That being said, 0.8.2.1
is the correct version, so that's a little strange.
How are you building and submitting your
Also, just to keep it simple, I am trying to use 1.6.0CDH5.7.0 in the
pom.xml as the cluster I am trying to run on is CDH5.7.0 with spark 1.6.0.
Here is my pom setting:
1.6.0-cdh5.7.0
org.apache.spark
spark-core_2.10
${cdh.spark.version}
compile
org.apache.spark
Which Scala version / Spark release are you using ?
Cheers
On Wed, Jun 22, 2016 at 8:20 PM, Sunita Arvind
wrote:
> Hello Experts,
>
> I am getting this error repeatedly:
>
> 16/06/23 03:06:59 ERROR streaming.StreamingContext: Error starting the
> context, marking it as
I don't see how that would be possible. I am reading from a live stream of
data through kafka.
On Sat 12 Mar, 2016 20:28 Ted Yu, wrote:
> Interesting.
> If kv._1 was null, shouldn't the NPE have come from getPartition() (line
> 105) ?
>
> Was it possible that records.next()
Interesting.
If kv._1 was null, shouldn't the NPE have come from getPartition() (line
105) ?
Was it possible that records.next() returned null ?
On Fri, Mar 11, 2016 at 11:20 PM, Prabhu Joseph
wrote:
> Looking at ExternalSorter.scala line 192, i suspect some input
I am using the following versions:
org.apache.spark
spark-streaming_2.10
1.6.0
org.apache.spark
spark-streaming-kafka_2.10
Which Spark release do you use ?
I wonder if the following may have fixed the problem:
SPARK-8029 Robust shuffle writer
JIRA is down, cannot check now.
On Fri, Mar 11, 2016 at 11:01 PM, Saurabh Guru
wrote:
> I am seeing the following exception in my Spark Cluster every
Looking at ExternalSorter.scala line 192, i suspect some input record has
Null key.
189 while (records.hasNext) {
190addElementsRead()
191kv = records.next()
192map.changeValue((getPartition(kv._1), kv._1), update)
On Sat, Mar 12, 2016 at 12:48 PM, Prabhu Joseph
Looking at ExternalSorter.scala line 192
189
while (records.hasNext) { addElementsRead() kv = records.next()
map.changeValue((getPartition(kv._1), kv._1), update)
maybeSpillCollection(usingMap = true) }
On Sat, Mar 12, 2016 at 12:31 PM, Saurabh Guru
wrote:
> I am seeing
I Still can't make the logger work inside a map function. I can use
"logInfo("")" in the main but not in the function. Anyway I rewrite my
program to use java.util.Date instead joda time and I don't have NPE
anymore.
I will stick with this solution for the moment even if I find java Date
ugly.
Even if log4j didn't work, you can still get some clue by wrapping the
following call with try block:
currentDate = currentDate.plusDays(1)
catching NPE and rethrowing with an exception that shows the value of
currentDate
Cheers
On Thu, Nov 12, 2015 at 1:56 AM, Romain Sagean
i remember us having issues with joda classes not serializing property and
coming out null "on the other side" in tasks
On Thu, Nov 12, 2015 at 10:12 AM, Ted Yu wrote:
> Even if log4j didn't work, you can still get some clue by wrapping the
> following call with try block:
In case you need to adjust log4j properties, see the following thread:
http://search-hadoop.com/m/q3RTtJHkzb1t0J66=Re+Spark+Streaming+Log4j+Inside+Eclipse
Cheers
On Tue, Nov 10, 2015 at 1:28 PM, Ted Yu wrote:
> I took a look at
>
Can you show the stack trace for the NPE ?
Which release of Spark are you using ?
Cheers
On Tue, Nov 10, 2015 at 8:20 AM, romain sagean
wrote:
> Hi community,
> I try to apply the function below during a flatMapValues or a map but I
> get a nullPointerException with the
see below a more complete version of the code.
the firstDate (previously minDate) should not be null, I even added an
extra "filter( _._2 != null)" before the flatMap and the error is
still there.
What I don't understand is why I have the error on
dateSeq.las.plusDays and not on
I took a look at
https://github.com/JodaOrg/joda-time/blob/master/src/main/java/org/joda/time/DateTime.java
Looks like the NPE came from line below:
long instant = getChronology().days().add(getMillis(), days);
Maybe catch the NPE and print out the value of currentDate to see if there
is more
Did you try to cache a DataFrame with just a single row?
Do you rows have any columns with null values?
Can you post a code snippet here on how you load/generate the dataframe?
Does dataframe.rdd.cache work?
*Romi Kuntsman*, *Big Data Engineer*
http://www.totango.com
On Thu, Oct 29, 2015 at 4:33
Thanks Romi,
I resize the dataset to 7MB, however, the code show NullPointerException
exception as well.
Did you try to cache a DataFrame with just a single row?
Yes, I tried. But, Same problem.
.
Do you rows have any columns with null values?
No, I had filter out null values before cache the
>
> BUT, after change limit(500) to limit(1000). The code report
> NullPointerException.
>
I had a similar situation, and the problem was with a certain record.
Try to find which records are returned when you limit to 1000 but not
returned when you limit to 500.
Could it be a NPE thrown from
Did you try:
val data = indexed_files.groupByKey
val *modified_data* = data.map { a =
var name = a._2.mkString(,)
(a._1, name)
}
*modified_data*.foreach { a =
var file = sc.textFile(a._2)
println(file.count)
}
Thanks
Best Regards
On Wed, Jul 22, 2015 at 2:18 AM, MorEru
Created PR and verified the example given by Justin works with the change:
https://github.com/apache/spark/pull/6793
Cheers
On Wed, Jun 10, 2015 at 7:15 PM, Ted Yu yuzhih...@gmail.com wrote:
Looks like the NPE came from this line:
@transient protected lazy val rng = new XORShiftRandom(seed
Are you calling hiveContext.sql within an RDD.map closure or something
similar? In this way, the call actually happens on executor side.
However, HiveContext only exists on the driver side.
Cheng
On 6/4/15 3:45 PM, patcharee wrote:
Hi,
I am using Hive 0.14 and spark 0.13. I got
Look at the trace again. It is a very weird error. The SparkSubmit is running
on client side, but YarnClusterSchedulerBackend is supposed in running in YARN
AM.
I suspect you are running the cluster with yarn-client mode, but in
JavaSparkContext you set yarn-cluster”. As a result, spark
Which version of Spark are you using?
On Wed, Dec 31, 2014 at 10:24 PM, rapelly kartheek kartheek.m...@gmail.com
wrote:
Hi,
I get this following Exception when I submit spark application that
calculates the frequency of characters in a file. Especially, when I
increase the size of data, I
spark-1.0.0
On Thu, Jan 1, 2015 at 12:04 PM, Josh Rosen rosenvi...@gmail.com wrote:
Which version of Spark are you using?
On Wed, Dec 31, 2014 at 10:24 PM, rapelly kartheek
kartheek.m...@gmail.com wrote:
Hi,
I get this following Exception when I submit spark application that
calculates
It looks like 'null' might be selected as a block replication peer?
https://github.com/apache/spark/blob/v1.0.0/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L786
I know that we fixed some replication bugs in newer versions of Spark (such
as
Ok. Let me try out on a newer version.
Thank you!!
On Thu, Jan 1, 2015 at 12:17 PM, Josh Rosen rosenvi...@gmail.com wrote:
It looks like 'null' might be selected as a block replication peer?
Could you post the stack trace?
Best Regards,
Shixiong Zhu
2014-12-16 23:21 GMT+08:00 richiesgr richie...@gmail.com:
Hi
This time I need expert.
On 1.1.1 and only in cluster (standalone or EC2)
when I use this code :
countersPublishers.foreachRDD(rdd = {
,
Cristóvão José Domingues Cordeiro
IT Department - 28/R-018
CERN
--
*From:* Simone Franzini [captainfr...@gmail.com]
*Sent:* 15 December 2014 16:52
*To:* Cristovao Jose Domingues Cordeiro
*Subject:* Re: NullPointerException When Reading Avro Sequence Files
Ok, I
--
*From:* Simone Franzini [captainfr...@gmail.com]
*Sent:* 06 December 2014 15:48
*To:* Cristovao Jose Domingues Cordeiro
*Subject:* Re: NullPointerException When Reading Avro Sequence Files
java.lang.IncompatibleClassChangeError: Found interface
*Subject:* Re: NullPointerException When Reading Avro Sequence Files
Hi Cristovao,
I have seen a very similar issue that I have posted about in this thread:
http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-NPE-with-Array-td19797.html
I think your main issue here is somewhat
Hi all,
I've tried the above example on Gist, but it doesn't work (at least for me).
Did anyone get this:
14/12/05 10:44:40 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.IncompatibleClassChangeError: Found interface
org.apache.hadoop.mapreduce.TaskAttemptContext, but class
Hi TD,
This is actually an important requirement (recovery of shared variables) for
us as we need to spread some referential data across the Spark nodes on
application startup. I just bumped into this issue on Spark version 1.0.1. I
assume the latest one also doesn't include this capability. Are
This is actually a very tricky as their two pretty big challenges that need
to be solved.
(i) Checkpointing for broadcast variables: Unlike RDDs, broadcasts variable
dont have checkpointing support (that is you cannot write the content of a
broadcast variable to HDFS and recover it automatically
Looking at the source codes of DStream.scala
/**
* Return a new DStream in which each RDD has a single element generated
by counting each RDD
* of this DStream.
*/
def count(): DStream[Long] = {
this.map(_ = (null, 1L))
Message-
From: anoldbrain [mailto:anoldbr...@gmail.com]
Sent: Wednesday, August 20, 2014 4:13 PM
To: u...@spark.incubator.apache.org
Subject: Re: NullPointerException from '.count.foreachRDD'
Looking at the source codes of DStream.scala
/**
* Return a new DStream in which each RDD has
Thank you for the reply. I implemented my InputDStream to return None when
there's no data. After changing it to return empty RDD, the exception is
gone.
I am curious as to why all other processings worked correctly with my old
incorrect implementation, with or without data? My actual codes,
Seems https://issues.apache.org/jira/browse/SPARK-2846 is the jira tracking
this issue.
On Mon, Aug 18, 2014 at 6:26 PM, cesararevalo ce...@zephyrhealthinc.com
wrote:
Thanks, Zhan for the follow up.
But, do you know how I am supposed to set that table name on the jobConf? I
don't have
Thanks! Yeah, it may be related to that. I'll check out that pull request
that was sent and hopefully that fixes the issue. I'll let you know, after
fighting with this issue yesterday I had decided to just leave it on the
side and return to it after, so it may take me a while to get back to you.
Looks like your hiveContext is null. Have a look at this documentation.
https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables
Thanks
Best Regards
On Mon, Aug 18, 2014 at 12:09 PM, Cesar Arevalo ce...@zephyrhealthinc.com
wrote:
Hello:
I am trying to setup Spark to
Nope, it is NOT null. Check this out:
scala hiveContext == null
res2: Boolean = false
And thanks for sending that link, but I had already looked at it. Any other
ideas?
I looked through some of the relevant Spark Hive code and I'm starting to
think this may be a bug.
-Cesar
On Mon, Aug 18,
Then definitely its a jar conflict. Can you try removing this jar from the
class path /opt/spark-poc/lib_managed/jars/org.spark-project.hive/hive-exec/
hive-exec-0.12.0.jar
Thanks
Best Regards
On Mon, Aug 18, 2014 at 12:45 PM, Cesar Arevalo ce...@zephyrhealthinc.com
wrote:
Nope, it is NOT
I removed the JAR that you suggested but now I get another error when I try
to create the HiveContext. Here is the error:
scala val hiveContext = new HiveContext(sc)
error: bad symbolic reference. A signature in HiveContext.class refers to
term ql
in package org.apache.hadoop.hive which is not
Looks like hbaseTableName is null, probably caused by incorrect configuration.
String hbaseTableName = jobConf.get(HBaseSerDe.HBASE_TABLE_NAME);
setHTable(new HTable(HBaseConfiguration.create(jobConf),
Bytes.toBytes(hbaseTableName)));
Here is the definition.
public static final
Thanks, Zhan for the follow up.
But, do you know how I am supposed to set that table name on the jobConf? I
don't have access to that object from my client driver?
--
View this message in context:
For those curious I used the JavaSparkContext and got access to an
AvroSequenceFile (wrapper around Sequence File) using the following:
file = sc.newAPIHadoopFile(hdfs path to my file,
AvroSequenceFileInputFormat.class, AvroKey.class, AvroValue.class,
new Configuration())
--
View this
I see Spark is using AvroRecordReaderBase, which is used to grab Avro
Container Files, which is different from Sequence Files. If anyone is using
Avro Sequence Files with success and has an example, please let me know.
--
View this message in context:
To be more specific, I'm working with a system that stores data in
org.apache.avro.hadoop.io.AvroSequenceFile format. An AvroSequenceFile is
A wrapper around a Hadoop SequenceFile that also supports reading and
writing Avro data.
It seems that Spark does not support this out of the box.
--
I got this working locally a little while ago when playing around with
AvroKeyInputFile: https://gist.github.com/MLnick/5864741781b9340cb211
But not sure about AvroSequenceFile. Any chance you have an example
datafile or records?
On Sat, Jul 19, 2014 at 11:00 AM, Sparky gullo_tho...@bah.com
Thanks for the gist. I'm just now learning about Avro. I think when you use
a DataFileWriter you are writing to an Avro Container (which is different
than an Avro Sequence File). I have a system where data was written to an
HDFS Sequence File using AvroSequenceFile.Writer (which is a wrapper
I think you probably want to use `AvroSequenceFileOutputFormat` with
`newAPIHadoopFile`. I'm not even sure that in hadoop you would use
SequenceFileInput format to read an avro sequence file
--
View this message in context:
Thanks for responding. I tried using the newAPIHadoopFile method and got an
IO Exception with the message Not a data file.
If anyone has an example of this working I'd appreciate your input or
examples.
What I entered at the repl and what I got back are below:
val myAvroSequenceFile =
Hi Konstantin,
Thanks for reporting this. This happens because there are null keys in your
data. In general, Spark should not throw null pointer exceptions, so this
is a bug. I have fixed this here: https://github.com/apache/spark/pull/1288.
For now, you can workaround this by special-handling
I am also seeing similar problem when trying to continue job using saved
checkpoint. Can somebody help in solving this problem?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/NullPointerException-on-reading-checkpoint-files-tp7306p7507.html
Sent from the
63 matches
Mail list logo