I'm trying to parse a RDD[Seq[String]] to Dataframe.
ALthough it's a Seq of Strings they could have a more specific type as Int,
Boolean, Double, String an so on.
For example, a line could be:
"hello", "1", "bye", "1.1"
"hello1", "11", "bye1", "2.1"
...
First column is going to be always a
)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
19/05/28 11:11:18 INFO executor.CoarseGrainedExecutorBackend: Got
assigned task 369
El mar., 28 may. 2019 a las 12:12, Guillermo Ortiz Fernández (<
guillermo.ortiz.f...@gmail.com>) escribió:
I'm executing a load process into HBase with spark. (around 150M record).
At the end of the process there are a lot of fail tasks.
I get this error:
19/05/28 11:02:31 ERROR client.AsyncProcess: Failed to get region location
org.apache.hadoop.hbase.TableNotFoundException: my_table
at
I have a process in Spark Streamin which lasts 2 seconds. When I check
where the time is spent I see about 0.8s-1s in processing time although the
global time is 2s. This one second is spent in the driver.
I reviewed the code which is executed by the driver and I commented some of
this code with
:lib/kafka-clients-0.10.0.1.jar
\
--files conf/${1}Conf.json iris-core-0.0.1-SNAPSHOT.jar conf/${1}Conf.json
El mié., 5 sept. 2018 a las 11:11, Guillermo Ortiz Fernández (<
guillermo.ortiz.f...@gmail.com>) escribió:
> I want to execute my processes in cluster mode. As I don't know where th
I want to execute my processes in cluster mode. As I don't know where the
driver has been executed I have to do available all the file it needs. I
undertand that they are two options. Copy all the files to all nodes of
copy them to HDFS.
My doubt is,, if I want to put all the files in HDFS, isn't
I can't... do you think that it's a possible bug of this version?? from
Spark or Kafka?
El mié., 29 ago. 2018 a las 22:28, Cody Koeninger ()
escribió:
> Are you able to try a recent version of spark?
>
> On Wed, Aug 29, 2018 at 2:10 AM, Guillermo Ortiz Fernández
> wrote:
> &g
I got this error from spark driver, it seems that I should increase the
memory in the driver although it's 5g (and 4 cores) right now. It seems
weird to me because I'm not using Kryo or broadcast in this process but in
the log there are references to Kryo and broadcast.
How could I figure out the
I'm using Spark Streaming 2.0.1 with Kafka 0.10, sometimes I get this
exception and Spark dies.
I couldn't see any error or problem among the machines, anybody has the
reason about this error?
java.lang.IllegalStateException: This consumer has already been closed.
at
Hello,
I want to set data in a broadcast (Map) variable in Spark.
Sometimes there are new data so I have to update/refresh the values but I'm
not sure how I could do this.
My idea is to use accumulators like a flag when a cache error occurs, in
this point I could read the data and reload the
I'm consuming data from Kafka with createDirectStream and store the
offsets in Kafka (
https://spark.apache.org/docs/2.1.0/streaming-kafka-0-10-integration.html#kafka-itself
)
val stream = KafkaUtils.createDirectStream[String, String](
streamingContext,
PreferConsistent,
Subscribe[String,
I want to measure how long it takes some different transformations in Spark
as map, joinWithCassandraTable and so on. Which one is the best
aproximation to do it?
def time[R](block: => R): R = {
val t0 = System.nanoTime()
val result = block
val t1 = System.nanoTime()
12 matches
Mail list logo