date:20170227

Re: spark.speculation setting support on standalone mode?

2017-02-27 Thread Saisai Shao

I think it should be. These configurations doesn't depend on specific cluster manager use chooses. On Tue, Feb 28, 2017 at 4:42 AM, satishl wrote: > Are spark.speculation and related settings supported on standalone mode? > > > > -- > View this message in context:

Run spark machine learning example on Yarn failed

2017-02-27 Thread Yunjie Ji

After start the dfs, yarn and spark, I run these code under the root directory of spark on my master host: `MASTER=yarn ./bin/run-example ml.LogisticRegressionExample data/mllib/sample_libsvm_data.txt` Actually I get these code from spark's README. And here is the source code about

Error while enabling Hive Support in Spark 2.1

2017-02-27 Thread SRK

Hi, I have been trying to get my Spark job upgraded to 2.x. I see the following error. It seems to be looking for some global_temp database by default. Is this a behaviour of Spark 2.x that it looks for global_temp database by default? 17/02/27 16:59:09 INFO HiveMetaStore.audit: ugi=user1234

using spark to load a data warehouse in real time

2017-02-27 Thread Adaryl Wakefield

Is anybody using Spark streaming/SQL to load a relational data warehouse in real time? There isn't a lot of information on this use case out there. When I google real time data warehouse load, nothing I find is up to date. It's all turn of the century stuff and doesn't take into account

[Spark Kafka] API Doc pages for Kafka 0.10 not current

2017-02-27 Thread Afshartous, Nick

Hello, Looks like the API docs linked from the Spark Kafka 0.10 Integration page are not current. For instance, on the page https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html the code examples show the new API (i.e. class ConsumerStrategies). However,

spark.speculation setting support on standalone mode?

2017-02-27 Thread satishl

Are spark.speculation and related settings supported on standalone mode? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-speculation-setting-support-on-standalone-mode-tp28433.html Sent from the Apache Spark User List mailing list archive at

Re: SPark - YARN Cluster Mode

2017-02-27 Thread ayan guha

Hi Thanks a lot, i used property file to resolve the issue. I think documentation should mention it though. On Tue, 28 Feb 2017 at 5:05 am, Marcelo Vanzin wrote: > > none of my Config settings > > Is it none of the configs or just the queue? You can't set the YARN > queue

Re: How to set hive configs in Spark 2.1?

2017-02-27 Thread swetha kasireddy

Even the hive configurations like the following would work with this? sqlContext.setConf("hive.default.fileformat", "Orc") sqlContext.setConf("hive.exec.orc.memory.pool", "1.0") sqlContext.setConf("hive.optimize.sort.dynamic.partition", "true")

Re: Spark runs out of memory with small file

2017-02-27 Thread Henry Tremblay

Thanks! That works: def process_file(my_iter): the_id = "init" final = [] for chunk in my_iter: lines = chunk[1].split("\n") for line in lines: if line[0:15] == 'WARC-Record-ID:': the_id = line[15:] final.append(Row(the_id =

[Spark 2.1.0 ML] Serializing/Deserializing LocalLDA Problem

2017-02-27 Thread Benjamin Edwards

I am hoping someone can confirm this is a bug and/or provide a solution. I am trying to serialize an LDA model to disk for later use, but upon deserialization the model is not fully functional. In particular, transformation of data throws a NullPointerException. Here is a minimal example (just run

Re: SPark - YARN Cluster Mode

2017-02-27 Thread Marcelo Vanzin

> none of my Config settings Is it none of the configs or just the queue? You can't set the YARN queue in cluster mode through code, it has to be set in the command line. It's a chicken & egg problem (in cluster mode, the YARN app is created before your code runs). --property-file works the

Re: How to set hive configs in Spark 2.1?

2017-02-27 Thread neil90

All you need to do is - spark.conf.set("spark.sql.shuffle.partitions", 2000) spark.conf.set("spark.sql.orc.filterPushdown", True) ...etc -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-set-hive-configs-in-Spark-2-1-tp28429p28431.html Sent from the

Re: Is there a list of missing optimizations for typed functions?

2017-02-27 Thread lihu

Hi, you can refer to https://issues.apache.org/jira/browse/SPARK-14083 for more detail. For performance issue,it is better to using the DataFrame than DataSet API. On Sat, Feb 25, 2017 at 2:45 AM, Jacek Laskowski wrote: > Hi Justin, > > I have never seen such a list. I think

How to set hive configs in Spark 2.1?

2017-02-27 Thread SRK

Hi, How to set the hive configurations in Spark 2.1? I have the following in 1.6. How to set the configs related to hive using the new SparkSession? sqlContext.sql(s"use ${HIVE_DB_NAME} ") sqlContext.setConf("hive.exec.dynamic.partition", "true")

Re: Structured Streaming: How to handle bad input

2017-02-27 Thread sasubillis

I think it is users responsibility to validate the input before feeding. https://databricks.gitbooks.io/databricks-spark-knowledge-base/best_practices/dealing_with_bad_data.html -- View this message in context:

Re: Get S3 Parquet File

2017-02-27 Thread Femi Anthony

Ok, thanks a lot for the heads up. Sent from my iPhone > On Feb 25, 2017, at 10:58 AM, Steve Loughran wrote: > > >> On 24 Feb 2017, at 07:47, Femi Anthony wrote: >> >> Have you tried reading using s3n which is a slightly older protocol ? I'm >>

handling dependency conflicts with spark

2017-02-27 Thread Mendelson, Assaf

Hi, I have a project which uses Jackson 2.8.5. Spark on the other hand seems to be using 2.6.5 I am using maven to compile. My original solution to the problem have been to set spark dependencies with the "provided" scope and use maven shade plugin to shade Jackson in my compilation. The

Re: Spark 2.1.0 issue with spark-shell and pyspark

2017-02-27 Thread romain.jouin

Hi, master = "spark://193.70.43.207:7077" appName = "romain2" spark = SparkSession.builder.master(master).appName(appName).getOrCreate() also gives me an error : IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':" Any way out ? -- View

Re: attempting to map Dataset[Row]

2017-02-27 Thread Yan Facai

Hi, Fletcher. case class can help construct complex structure. and also, RDD, StructType and StructureField are helpful if you need. However, the code is a little confusing, source.map{ row => { val key = row(0) val buff = new ArrayBuffer[Row]() buff += row (key,buff)

Re: Spark runs out of memory with small file

2017-02-27 Thread Henry Tremblay

This won't work: rdd2 = rdd.flatMap(splitf) rdd2.take(1) [u'WARC/1.0\r'] rdd2.count() 508310 If I then try to apply a map to rdd2, the map only works on each individual line. I need to create a state machine as in my second function. That is, I need to apply a key to each line, but the

Re: spark.speculation setting support on standalone mode?

Run spark machine learning example on Yarn failed

Error while enabling Hive Support in Spark 2.1

using spark to load a data warehouse in real time

[Spark Kafka] API Doc pages for Kafka 0.10 not current

spark.speculation setting support on standalone mode?

Re: SPark - YARN Cluster Mode

Re: How to set hive configs in Spark 2.1?

Re: Spark runs out of memory with small file

[Spark 2.1.0 ML] Serializing/Deserializing LocalLDA Problem

Re: SPark - YARN Cluster Mode

Re: How to set hive configs in Spark 2.1?

Re: Is there a list of missing optimizations for typed functions?

How to set hive configs in Spark 2.1?

Re: Structured Streaming: How to handle bad input

Re: Get S3 Parquet File

handling dependency conflicts with spark

Re: Spark 2.1.0 issue with spark-shell and pyspark

Re: attempting to map Dataset[Row]

Re: Spark runs out of memory with small file

20 matches

Site Navigation

Mail list logo

Footer information