My apologies,
It was a problem of our Hadoop cluster.
When we tested the same code on another cluster (HDP-based), it worked
without any problem.
```scala
## make sjis text
cat a.txt
8月データだけでやってみよう
nkf -W -s a.txt >b.txt
cat b.txt
87n%G!<%?$@$1$G$d$C$F$_$h$&
nkf -s -w b.txt
8月データだけでやってみよう
hdfs
Dear Spark ML members,
I experienced a trouble in using "multiLine" option to load CSV data with
Shift-JIS encoding.
When option("multiLine", true) is specified, option("encoding",
"encoding-name") just doesn't work anymore.
In CSVDataSource.scala file, I found that
>From our perspective, we have invested heavily in Kubernetes as our cluster
manager of choice.
We also make quite heavy use of spark. We've been experimenting with using
these builds (2.1 with pyspark enabled) quite heavily. Given that we've
already 'paid the price' to operate Kubernetes in
Ok thanks
Few more
1.when I looked into the documentation it says onQueryprogress is not
threadsafe ,So Is this method would be the right place to refresh cache?and
no need to restart query if I choose listener ?
The methods are not thread-safe as they may be called from different
threads.
Both works. The asynchronous method with listener will have less of down
time, just that the first trigger/batch after the asynchronous
unpersist+persist will probably take longer as it has to reload the data.
On Tue, Aug 15, 2017 at 2:29 PM, purna pradeep
wrote:
>
Thanks tathagata das actually I'm planning to something like this
activeQuery.stop()
//unpersist and persist cached data frame
df.unpersist()
//read the updated data //data size of df is around 100gb
df.persist()
activeQuery = startQuery()
the cached data frame size around 100gb ,so
Hi. I am using Spark for querying Hive followed by transformations. My
Scala app creates multiple Spark Applications. A new spark context (and
session) is created only after closing previous SparkSession and Spark
Context.
However, on stopping sc and spark, somehow connections to Hive Metastore
You can do something like this.
def startQuery(): StreamingQuery = {
// create your streaming dataframes
// start the query with the same checkpoint directory}
// handle to the active queryvar activeQuery: StreamingQuery = null
while(!stopped) {
if (activeQuery = null) { // if
Thanks Michael
I guess my question is little confusing ..let me try again
I would like to restart streaming query programmatically while my streaming
application is running based on a condition and why I want to do this
I want to refresh a cached data frame based on a condition and the best
See
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#recovering-from-failures-with-checkpointing
Though I think that this currently doesn't work with the console sink.
On Tue, Aug 15, 2017 at 9:40 AM, purna pradeep
wrote:
> Hi,
>
>>
>>
The input dataset has multiple days worth of data, so I thought the
watermark should have been crossed. To debug, I changed the query to the
code below. My expectation was that since I am doing 1 day windows with
late arrivals permitted for 1 second, when it sees records for the next
day, it would
Hi,
How to force Spark Kafka Direct to start from the latest offset when the lag
is huge in kafka 10? It seems to be processing from the latest offset stored
for a group id. One way to do this is to change the group id. But it would
mean that each time that we need to process the job from the
Hi,
>
> I'm trying to restart a streaming query to refresh cached data frame
>
> Where and how should I restart streaming query
>
val sparkSes = SparkSession
.builder
.config("spark.master", "local")
.appName("StreamingCahcePoc")
.getOrCreate()
import
ResultStage cost time is your job's last stage cost time.
Job 13 finished: reduce at VertexRDDImpl.scala:90, took 0.035546 s is the
time your job cost
2017-08-14 18:58 GMT+08:00 Kaepke, Marc :
> Hi everyone,
>
> I’m a Spark newbie and have one question:
> What is the
How to calculating CPU time for a Spark Job? Is there any interface can be
directly call?
like the hadoop Map-Reduce Framework provider the CPU time spent(ms) in the
Counters.
thinks!
15 matches
Mail list logo