Thanks all...
btw, s3n load works without any issues with spark-1.3.1-bulit-for-hadoop
2.4
I tried this on 1.3.1-hadoop26
sc.hadoopConfiguration.set(fs.s3n.impl,
org.apache.hadoop.fs.s3native.NativeS3FileSystem)
val f = sc.textFile(s3n://bucket/file)
f.count
No it can't find the
Hi firemonk9,
What you're doing looks interesting. Can you share some more details?
Are you running the same spark context for each job, or are you running a
seperate spark context for each job?
Does your system need sharing of rdd's across multiple jobs? If yes, how do
you implement that?
Also
You need to expose that variable the same way you'd expose any other variable
in Python that you wanted to see across modules. As long as you share a spark
context all will work as expected.
http://stackoverflow.com/questions/142545/python-how-to-make-a-cross-module-variable
Sent with Good
Hi all
Is there any good documentation on how to integrate spark with Hue 3.7.x?
Is the only way to install spark Job Server?
Thanks in advance for your help
Hm, no I don't have that in my path.
However, someone on the spark-csv project advised that since I could not
get another package/example to work, that this might be a Spark / Yarn
issue: https://github.com/databricks/spark-csv/issues/54
Thoughts? I'll open a ticket later this afternoon if the
I have seen multiple blogs stating to use reduceByKey instead of
groupByKey. Could someone please help me in converting below code to use
reduceByKey
Code
some spark processing
...
Below
val viEventsWithListingsJoinSpsLevelMetric:
If I enable dynamicAllocation and then use spark-shell or pyspark,
things start out working as expected: running simple commands causes new
executors to start and complete tasks. If the shell is left idle for a
while, executors start getting killed off:
15/04/23 10:52:43 INFO
NativeS3FileSystem class is in hadoop-aws jar.
Looks like it was not on classpath.
Cheers
On Thu, Apr 23, 2015 at 7:30 AM, Sujee Maniyam su...@sujee.net wrote:
Thanks all...
btw, s3n load works without any issues with spark-1.3.1-bulit-for-hadoop
2.4
I tried this on 1.3.1-hadoop26
I strongly recommend spawning a new process for the Spark jobs. Much
cleaner separation. Your driver program won't be clobbered if the Spark job
dies, etc. It can even watch for failures and restart.
In the Scala standard library, the sys.process package has classes for
constructing and
Using Spark streaming to create a large volume of small nano-batch input files,
~4k per file, thousands of ‘part-x’ files. When reading the nano-batch
files and doing a distributed calculation my tasks run only on the machine
where it was launched. I’m launching in “yarn-client” mode. The
Hi everybody,
I think I could use some help with the /updateStateByKey()/ JAVA method in
Spark Streaming.
*Context:*
I have a /JavaReceiverInputDStreamDataUpdate du/ DStream, where object
/DataUpdate/ mainly has 2 fields of interest (in my case), namely
du.personId (an Integer) and
Sure
var columns = mc.textFile(source).map { line = line.split(delimiter) }
Here “source” is a comma delimited list of files or directories. Both the
textFile and .map tasks happen only on the machine they were launched from.
Later other distributed operations happen but I suspect if I can
Argh, I looked and there really isn’t that much data yet. There will be
thousands but starting small.
I bet this is just a total data size not requiring all workers thing—sorry,
nevermind.
On Apr 23, 2015, at 10:30 AM, Pat Ferrel p...@occamsmachete.com wrote:
They are in HDFS so available on
Hi All,
I am running some benchmark on r3*8xlarge instance. I have a cluster with
one master (no executor on it) and one slave (r3*8xlarge).
My job has 1000 tasks in stage 0.
R3*8xlarge has 244G memory and 32 cores.
If I create 4 executors, each has 8 core+50G memory, each task
Thanks for the response, Conor. I tried with those settings and for a while
it seemed like it was cleaning up shuffle files after itself. However,
after exactly 5 hours later it started throwing exceptions and eventually
stopped working again. A sample stack trace is below. What is curious about
5
Hi Shuai,
You can use as to create a table alias. For example, df1.as(df1). Then
you can use $df1.col to refer it.
Thanks,
Yin
On Thu, Apr 23, 2015 at 11:14 AM, Shuai Zheng szheng.c...@gmail.com wrote:
Hi All,
I use 1.3.1
When I have two DF and join them on a same name key, after
Where are the file splits? meaning is it possible they were also
(only) available on one node and that was also your driver?
On Thu, Apr 23, 2015 at 1:21 PM, Pat Ferrel p...@occamsmachete.com wrote:
Sure
var columns = mc.textFile(source).map { line = line.split(delimiter) }
Here “source”
Shuai:
Please take a look at:
http://blog.takipi.com/garbage-collectors-serial-vs-parallel-vs-cms-vs-the-g1-and-whats-new-in-java-8/
On Apr 23, 2015, at 10:18 AM, Dean Wampler deanwamp...@gmail.com wrote:
JVM's often have significant GC overhead with heaps bigger than 64GB. You
might try
Physically? Not sure, they were written using the nano-batch rdds in a
streaming job that is in a separate driver. The job is a Kafka consumer.
Would that effect all derived rdds? If so is there something I can do to mix it
up or does Spark know best about execution speed here?
On Apr 23,
Here it is. How do I access a broadcastVar in a function that's in another
module (process_stuff.py below):
Thanks,
Vadim
main.py
---
from pyspark import SparkContext, SparkConf
from pyspark.streaming import StreamingContext
from pyspark.sql import SQLContext
from process_stuff import
Thanks Ilya. I am having trouble doing that. Can you give me an example?
ᐧ
On Thu, Apr 23, 2015 at 12:06 PM, Ganelin, Ilya ilya.gane...@capitalone.com
wrote:
You need to expose that variable the same way you'd expose any other
variable in Python that you wanted to see across modules. As long
Will you be able to paste code here?
On 23 April 2015 at 22:21, Pat Ferrel p...@occamsmachete.com wrote:
Using Spark streaming to create a large volume of small nano-batch input
files, ~4k per file, thousands of 'part-x' files. When reading the
nano-batch files and doing a distributed
Hi,
Attempted to request a negative number of executor(s) -663 from the
cluster manager. Please specify a positive number!
This is a bug in dynamic allocation. Here is the jira-
https://issues.apache.org/jira/browse/SPARK-6954
Thanks!
Cheolsoo
On Thu, Apr 23, 2015 at 7:57 AM, Michael Stone
What was the state of your streaming application? Was it falling behind
with a large increasing scheduling delay?
TD
On Thu, Apr 23, 2015 at 11:31 AM, N B nb.nos...@gmail.com wrote:
Thanks for the response, Conor. I tried with those settings and for a
while it seemed like it was cleaning up
Hi Everyone,
I am running into a really weird problem that only one other person has
reported to the best of my knowledge (and the thread never yielded a
resolution). I build a GraphX Graph from an input EdgeRDD and VertexRDD via
the Graph(VertexRDD,EdgeRDD) constructor. When I execute
Hi experts,
Sorry if this is a n00b question or has already been answered...
Am trying to use the data frames API in python to join 2 dataframes
with more than 1 column. The example I've seen in the documentation
only shows a single column - so I tried this:
Example code
import pandas
Got it. Thanks! J
From: Yin Huai [mailto:yh...@databricks.com]
Sent: Thursday, April 23, 2015 2:35 PM
To: Shuai Zheng
Cc: user
Subject: Re: Bug? Can't reference to the column by name after join two
DataFrame on a same name key
Hi Shuai,
You can use as to create a table alias. For
Should I repost this to dev list ?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/why-does-groupByKey-return-RDD-K-Iterable-V-not-RDD-K-CompactBuffer-V-tp22616p22640.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
I had asked this question before, but wanted to ask again as I think
it is related to my pom file or project setup.
I have been trying on/off for the past month to try to run this MLlib example:
-
To unsubscribe, e-mail:
I am trying to figure out python library management. So my question is:
Where do third party Python libraries(ex. numpy, scipy, etc.) need to exist
if I running a spark job via 'spark-submit' against my cluster in 'yarn
client' mode. Do the libraries need to only exist on the client(ie. the
Sorry, accidentally sent the last email before finishing.
I had asked this question before, but wanted to ask again as I think
it is now related to my pom file or project setup. Really appreciate the help!
I have been trying on/off for the past month to try to run this MLlib
example:
Can anybody point me to an example, if available, about gridsearch with python?
Thank you,
Hi Akhil
I can confirm that the problem goes away when jsonRaw and jsonClean are in
different s3 buckets.
thanks
Daniel
On Thu, Apr 23, 2015 at 1:27 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Can you try writing to a different S3 bucket and confirm that?
Thanks
Best Regards
On Thu,
They are in HDFS so available on all workers
On Apr 23, 2015, at 10:29 AM, Pat Ferrel p...@occamsmachete.com wrote:
Physically? Not sure, they were written using the nano-batch rdds in a
streaming job that is in a separate driver. The job is a Kafka consumer.
Would that effect all derived
Hi All,
I use 1.3.1
When I have two DF and join them on a same name key, after that, I can't get
the common key by name.
Basically:
select * from t1 inner join t2 on t1.col1 = t2.col1
And I am using purely DataFrame, spark SqlContext not HiveContext
DataFrame df3 =
Is the Spark-1.3.1 support build with scala 2.8 ? Wether it can integrated
with kafka_2.8.0-0.8.0 If build with scala 2.10 .
Thanks.
If you return an iterable, you are not tying the API to a compactbuffer.
Someday, the data could be fetched lazily and he API would not have to
change.
On Apr 23, 2015 6:59 PM, Dean Wampler deanwamp...@gmail.com wrote:
I wasn't involved in this decision (I just make the fries), but
Hi Andrew,
I observed similar behavior under high GC pressure, when running ALS. What
happened to me was that, there would be very long Full GC pauses (over 600
seconds at times). These would prevent the executors from sending
heartbeats to the driver. Then the driver would think that the
because CompactBuffer is considered an implementation detail. It is also
not public for the same reason.
On Thu, Apr 23, 2015 at 6:46 PM, Hao Ren inv...@gmail.com wrote:
Should I repost this to dev list ?
--
View this message in context:
[My apologies if this is a re-post. I wasn't subscribed the first time I
sent this message, and I'm hoping this second message will get through.]
I’ve been using Spark 1.3.0 and MLlib for some machine learning tasks. In a
fit of blind optimism, I decided to try running MLlib’s Principal
Hi Andrew,
The .principalComponents feature of RowMatrix is currently constrained to
tall and skinny matrices. Your matrix is barely above the skinny
requirement (10k columns), though the number of rows is fine.
What are you looking to do with the principal components? If unnormalized
PCA is OK
I wasn't involved in this decision (I just make the fries), but
CompactBuffer is designed for relatively small data sets that at least fit
in memory. It's more or less an Array. In principle, returning an iterator
could hide the actual data structure that might be needed to hold a much
bigger data
If by C you mean the parameter C in LIBLINEAR, the corresponding
parameter in MLlib is regParam:
https://github.com/apache/spark/blob/master/python/pyspark/mllib/classification.py#L273,
while regParam = 1/C. -Xiangrui
On Wed, Apr 22, 2015 at 3:25 PM, Pagliari, Roberto
rpagli...@appcomsci.com
So I got the tip of trying to reduce step-size and that finally gave some
more decent results, had hoped for the default params to give at least OK
results and thought that the problem must be somewhere else in the code.
Problem solved!
--
View this message in context:
ALS.setCheckpointInterval was added in Spark 1.3.1. You need to
upgrade Spark to use this feature. -Xiangrui
On Wed, Apr 22, 2015 at 9:03 PM, amghost zhengweita...@outlook.com wrote:
Hi, would you please how to checkpoint the training set rdd since all things
are done in ALS.train method.
Anyone any thought on this?
On 22 April 2015 at 22:49, Jeetendra Gangele gangele...@gmail.com wrote:
I made 7000 tasks in mapTopair and in distinct also I made same number of
tasks.
Still lots of shuffle read and write is happening due to application
running for much longer time.
Any idea?
Hi, SparkContext.newAPIHadoopRDD() is for working with new Hadoop mapreduce API.
So, you should import import
org.apache.accumulo.core.client.mapreduce.AccumuloInputFormat;
Instead of import org.apache.accumulo.core.client.mapred.AccumuloInputFormat;
-Original Message-
From: madhvi
Hi,
Yes, Spark automatically removes old RDDs from the cache when you make new
ones. Unpersist forces it to remove them right away.
On Thu, Apr 23, 2015 at 9:28 AM, Jeffery [via Apache Spark User List]
ml-node+s1001560n22618...@n3.nabble.com wrote:
Hi, Dear Spark Users/Devs:
In a method, I
Hi,
I have come across ways of building pipeline of input/transform and output
pipelines with Java (Google Dataflow/Spark etc). I also understand that
Spark itelf provides ways for creating a pipeline within mlib for
MLtransforms (primarily fit) Both of the above are available in Java/Scala
It seems like saveAsTextFile might do what you are looking for.
On Wednesday, April 22, 2015, Xi Shen davidshe...@gmail.com wrote:
Hi,
I have a RDD of some processed data. I want to write these files to HDFS,
but not for future M/R processing. I want to write plain old style text
file. I
Hi,
I have a RDD of some processed data. I want to write these files to HDFS,
but not for future M/R processing. I want to write plain old style text
file. I tried:
rdd foreach {d =
val file = // create the file using a HDFS FileSystem
val lines = d map {
// format data into string
}
the feature dimension is 800k.
yes, I believe the driver memory is likely the problem since it doesn't crash
until the very last part of the tree aggregation.
I'm running it via pyspark through YARN -- I have to run in client mode so I
can't set spark.driver.memory -- I've tried setting the
Hi All,
I am trying to execute batch processing in yarn-cluster mode i.e. I have
many sql insert queries,based on argument provided it will it will fetch the
queries ,create context , schema RDD and insert in hive tables,
Please Note- in standalone mode its working and in cluster mode working is
Hello,
I am currently trying to monitor the progression of jobs. I created a class
extending SparkListener, added a jobProgressListener to my sparkContext, and
overrided the methods OnTaskStart, OnTaskEnd, OnJobStart and OnJobEnd, which
leads to good results.
Then, I would also like to monitor
I do not think you can share data across spark contexts. So as long as you
can pass it around you should be good.
On 23 Apr 2015 17:12, Suraj Shetiya surajshet...@gmail.com wrote:
Hi,
I have come across ways of building pipeline of input/transform and output
pipelines with Java (Google
Hi,
Hive table creation need an extra step from 1.3. You can follow the
following template
df.registerTempTable(tableName)
hc.sql(screate table $tableName as select * from $tableName)
this will save the table in hive with given tableName.
Regards,
Madhukara Phatak
Quick questions: why are you cache both rdd and table?
Which stage of job is slow?
On 23 Apr 2015 17:12, Nikolay Tikhonov tikhonovnico...@gmail.com wrote:
Hi,
I have Spark SQL performance issue. My code contains a simple JavaBean:
public class Person implements Externalizable {
why are you cache both rdd and table?
I try to cache all the data to avoid the bad performance for the first
query. Is it right?
Which stage of job is slow?
The query is run many times on one sqlContext and each query execution
takes 1 second.
2015-04-23 11:33 GMT+03:00 ayan guha
I have a groupBy query after a map-side join leftOuterJoin. And this
query is running for more than 2 hours.
asks IndexIDAttemptStatusLocality LevelExecutor ID / HostLaunch TimeDurationGC
TimeShuffle Read Size / RecordsWrite TimeShuffle Write Size / RecordsErrors
0 36 0 RUNNING PROCESS_LOCAL 17
Hello Evo, Ranjitiyer,
I am also looking for the same thing. Using foreach is not useful for me as
processing the RDD as a whole won't be distributed across workers and that
would kill performance in my application :-/
Let me know if you find a solution for this.
Regards
--
View this
You can use transform which yields RDDs from the DStream as on each of the
RDDs you can then apply partitionBy - transform also returns another DSTream
while foreach doesn't
Btw what do you mean re foreach killing the performance by not distributing
the workload - every function (provided it
Hi Michael,
Here https://issues.apache.org/jira/browse/SPARK-7084 is the jira issue
and PR https://github.com/apache/spark/pull/5654 for the same. Please
have a look.
Regards,
Madhukara Phatak
http://datamantra.io/
On Thu, Apr 23, 2015 at 1:22 PM, madhu phatak phatak@gmail.com wrote:
Thank you ver much, Tathagata!
El miércoles, 22 de abril de 2015, Tathagata Das t...@databricks.com
escribió:
Aaah, that. That is probably a limitation of the SQLContext (cc'ing Yin
for more information).
On Wed, Apr 22, 2015 at 7:07 AM, Sergio Jiménez Barrio
drarse.a...@gmail.com
Hi
Can you share your Web UI, depicting your task level breakup.I can see many
thing
s that can be improved.
1. JavaRDDPerson rdds = ...rdds.cache(); -this caching is not needed as
you are not reading the rdd for any action
2.Instead of collecting as list, if you can save as text file, it
I've already tried UDT in Spark 1.2 and 1.3 but I encountered Kryo
Serialization Exception on Joining as tracked here
https://datastax-oss.atlassian.net/browse/SPARKC-23 , i've talked to
Michael Armbrust https://plus.google.com/u/1/109154927192908362223/posts
about the Exception, he said I'll
Hi
What do you mean disable the driver? what are you trying to achieve.
Thanks
Arush
On Thu, Apr 23, 2015 at 12:29 PM, guoqing0...@yahoo.com.hk
guoqing0...@yahoo.com.hk wrote:
Hi ,
I have a question about spark thrift server , i deployed the spark on yarn
and found if the spark driver
HI TD,
Some observations:
1. If I submit the application using spark-submit tool with *client as
deploy mode* it works fine with single master and worker (driver, master
and worker are running in same machine)
2. If I submit the application using spark-submit tool with client as
deploy mode it
Do you have commons-csv-1.1-bin.jar in your path somewhere ? I had to
download and add this.
Cheers
k/
On Wed, Apr 22, 2015 at 11:01 AM, Mohammed Omer beancinemat...@gmail.com
wrote:
Afternoon all,
I'm working with Scala 2.11.6, and Spark 1.3.1 built from source via:
`mvn -Pyarn
On Thursday 23 April 2015 12:22 PM, Akhil Das wrote:
Here's a complete scala example
https://github.com/bbux-proteus/spark-accumulo-examples/blob/1dace96a115f29c44325903195c8135edf828c86/src/main/scala/org/bbux/spark/AccumuloMetadataCount.scala
Thanks
Best Regards
On Thu, Apr 23, 2015 at
Hi all ,
My understanding for this problem is SQLConf will be overwrite by the
hiveconfig in initialization phase when setConf(key: String, value: String)
being called in the first time as below code snippets , so it is correctly in
later. I`m not sure whether it is right , any point are
My employer (adform.com) would like to use the Spark logo in a recruitment
event (to indicate that we are using Spark in our company). I looked in the
Spark repo (https://github.com/apache/spark/tree/master/docs/img) but
couldn't find a vector format.
Is a higher-res or vector format version
Do everyone do we have sample example how to use streaming k-means
clustering with java. I have seen some example usage in scala. can anybody
point me to the java example?
regards
jeetendra
Hello,
I would like to export RDD/DataFrames via JDBC SQL interface from the
standalone application for currently stable Spark v1.3.1.
I found one way of doing it but it requires the use of @DeveloperAPI method
HiveThriftServer2.startWithContext(sqlContext)
Is there a better, production level
For step 2, you can pipe application log to a file instead of copy-pasting.
Cheers
On Apr 22, 2015, at 10:48 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote:
I submit a spark app to YARN and i get these messages
15/04/22 22:45:04 INFO yarn.Client: Application report for
All these warnings come from ALS iterations, from flatMap and also from
aggregate, for instance the origin of the state where the flatMap is
showing these warnings (w/ Spark 1.3.0, they are also shown in Spark 1.3.1):
org.apache.spark.rdd.RDD.flatMap(RDD.scala:296)
Hi all,
I have been testing Spark ML algorithms with bigger dataset, and ran into
some problems with linear regression:
It seems the executors stop without any apparent reason:
15/04/22 20:15:05 INFO BlockManagerInfo: Added rdd_12492_80 in memory on
backend-node:48037 (size: 28.5 MB, free: 2.8
Following several discussions about how to improve the contribution
process in Spark, I've overhauled the guide to contributing. Anyone
who is going to contribute needs to read it, as it has more formal
guidance about the process:
ok yes, I think I have narrowed it down to being a problem with driver
memory settings. It looks like the application master/driver is not being
launched with the settings specified:
For the driver process on the main node I see -XX:MaxPermSize=128m
-Xms512m -Xmx512m as options used to start the
https://issues.apache.org/jira/browse/SPARK-7022.
Punya
On Thu, Apr 23, 2015 at 5:47 PM Pagliari, Roberto rpagli...@appcomsci.com
wrote:
Can anybody point me to an example, if available, about gridsearch with
python?
Thank you,
Hi, can you describe a little bit how the ThriftServer crashed, or steps to
reproduce that? It’s probably a bug of ThriftServer.
Thanks,
From: guoqing0...@yahoo.com.hk [mailto:guoqing0...@yahoo.com.hk]
Sent: Friday, April 24, 2015 9:55 AM
To: Arush Kharbanda
Cc: user
Subject: Re: Re: problem
*bump*
On Thu, Apr 23, 2015 at 3:46 PM, Sourav Chandra
sourav.chan...@livestream.com wrote:
HI TD,
Some observations:
1. If I submit the application using spark-submit tool with *client as
deploy mode* it works fine with single master and worker (driver, master
and worker are running in
Hi,
AFAIK it's only build with 2.10 and 2.11. You should integrate
kafka_2.10.0-0.8.0
to make it work.
Regards,
Madhukara Phatak
http://datamantra.io/
On Fri, Apr 24, 2015 at 9:22 AM, guoqing0...@yahoo.com.hk
guoqing0...@yahoo.com.hk wrote:
Is the Spark-1.3.1 support build with scala 2.8
I know grid search with cross validation is not supported. However, I was
wondering if there is something availalable for the time being.
Thanks,
From: Punyashloka Biswal [mailto:punya.bis...@gmail.com]
Sent: Thursday, April 23, 2015 9:06 PM
To: Pagliari, Roberto; user@spark.apache.org
Thank you very much for your suggestion.
Regards,
From: madhu phatak
Date: 2015-04-24 13:06
To: guoqing0...@yahoo.com.hk
CC: user
Subject: Re: Is the Spark-1.3.1 support build with scala 2.8 ?
Hi,
AFAIK it's only build with 2.10 and 2.11. You should integrate
kafka_2.10.0-0.8.0 to make it
Hi guys,
Having a problem build a DataFrame in Spark SQL from a JDBC data source
when running with --master yarn-client and adding the JDBC driver JAR with
--jars. If I run with a local[*] master all works fine.
./bin/spark-shell --jars /tmp/libs/mysql-jdbc.jar --master yarn-client
You'd have to use spark.{driver,executor}.extraClassPath to modify the
system class loader. But that also means you have to manually
distribute the jar to the nodes in your cluster, into a common
location.
On Thu, Apr 23, 2015 at 6:38 PM, Night Wolf nightwolf...@gmail.com wrote:
Hi guys,
Thanks Marcelo, can this be a path on HDFS?
On Fri, Apr 24, 2015 at 11:52 AM, Marcelo Vanzin van...@cloudera.com
wrote:
You'd have to use spark.{driver,executor}.extraClassPath to modify the
system class loader. But that also means you have to manually
distribute the jar to the nodes in your
No, those have to be local paths.
On Thu, Apr 23, 2015 at 6:53 PM, Night Wolf nightwolf...@gmail.com wrote:
Thanks Marcelo, can this be a path on HDFS?
On Fri, Apr 24, 2015 at 11:52 AM, Marcelo Vanzin van...@cloudera.com
wrote:
You'd have to use spark.{driver,executor}.extraClassPath to
Thanks for your reply , i would like to use Spark Thriftserver as JDBC SQL
interface and the Spark application running on YARN . but the application was
FINISHED when the Thriftserver crashed , all the cached table was lost .
Thriftserver start command:
start-thriftserver.sh --master yarn
Dear All,
When using spark 1.3.0 spark-submit with directing out and err to a log file, I
saw some strange lines inside that looks like this:
[Stage 0:(0 + 2) / 120]
[Stage 0:(2 + 2) /
Use this in spark conf: spark.ui.showConsoleProgress=false
Best Regards,
On Fri, Apr 24, 2015 at 11:23 AM, Henry Hung ythu...@winbond.com wrote:
Dear All,
When using spark 1.3.0 spark-submit with directing out and err to a log
file, I saw some strange lines inside that looks like this:
91 matches
Mail list logo