date:20160326

Re: Databricks fails to read the csv file with blank line at the file header

2016-03-26 Thread Koert Kuipers

To me this is expected behavior that I would not want fixed, but if you look at the recent commits for spark-csv it has one that deals this... On Mar 26, 2016 21:25, "Mich Talebzadeh" wrote: > > Hi, > > I have a standard csv file (saved as csv in HDFS) that has first

Databricks fails to read the csv file with blank line at the file header

2016-03-26 Thread Mich Talebzadeh

Hi, I have a standard csv file (saved as csv in HDFS) that has first line of blank at the header as follows [blank line] Date, Type, Description, Value, Balance, Account Name, Account Number [blank line] 22/03/2011,SBT,"'FUNDS TRANSFER , FROM A/C 1790999",200.00,200.00,"'BROWN

Re: Limit pyspark.daemon threads

2016-03-26 Thread Sven Krasser

Hey Ken, 1. You're correct, cached RDDs live on the JVM heap. (There's an off-heap storage option using Alluxio, formerly Tachyon, with which I have no experience however.) 2. The worker memory setting is not a hard maximum unfortunately. What happens is that during aggregation the Python daemon

Re: Hive on Spark engine

2016-03-26 Thread Mich Talebzadeh

Thanks Ted, More interested in general availability of Hive 2 on Spark 1.6 engine as opposed to Vendors specific custom built. Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Hive on Spark engine

2016-03-26 Thread Ted Yu

According to: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_HDP_RelNotes/bk_HDP_RelNotes-20151221.pdf Spark 1.5.2 comes out of box. Suggest moving questions on HDP to Hortonworks forum. Cheers On Sat, Mar 26, 2016 at 3:32 PM, Mich Talebzadeh wrote: >

Re: Hive on Spark engine

2016-03-26 Thread Mich Talebzadeh

Thanks Jorn. Just to be clear they get Hive working with Spark 1.6 out of the box (binary download)? The usual work-around is to build your own package and get the Hadoop-assembly jar file copied over to $HIVE_HOME/lib. Cheers Dr Mich Talebzadeh LinkedIn *

Re: Hive on Spark engine

2016-03-26 Thread Jörn Franke

If you check the newest Hortonworks distribution then you see that it generally works. Maybe you can borrow some of their packages. Alternatively it should be also available in other distributions. > On 26 Mar 2016, at 22:47, Mich Talebzadeh wrote: > > Hi, > > I am

Hive on Spark engine

2016-03-26 Thread Mich Talebzadeh

Hi, I am running Hive 2 and now Spark 1.6.1 but I still do not see any sign that Hive can utilise a Spark engine higher than 1.3.1 My understanding was that there were miss-match on Hadoop assembly Jar files that cause Hive not being able to run on Spark using the binary downloads. I just tried

Re: Limit pyspark.daemon threads

2016-03-26 Thread Carlile, Ken

This is extremely helpful! I’ll have to talk to my users about how the python memory limit should be adjusted and what their expectations are. I’m fairly certain we bumped it up in the dark past when jobs were failing because of insufficient memory for the python processes. So

Re: Limit pyspark.daemon threads

2016-03-26 Thread Sven Krasser

My understanding is that the spark.executor.cores setting controls the number of worker threads in the executor in the JVM. Each worker thread communicates then with a pyspark daemon process (these are not threads) to stream data into Python. There should be one daemon process per worker thread

A problem involving Spark & HBase.

2016-03-26 Thread ManasjyotiSharma

Disclaimer: This is more of a design question. I am very new to Spark and HBase. This is going to be my first project using these 2 technologies and so far in last 2 months or so I’ve been just going over different resources to have a grasp on Spark and HBase. My question concerns mainly in terms

Fwd: This simple UDF is not working!

2016-03-26 Thread Mich Talebzadeh

Thanks great Dhaval. scala> import java.text.SimpleDateFormat import java.text.SimpleDateFormat scala> scala> import java.sql.Date import java.sql.Date scala> scala> import scala.util.{Try, Success, Failure} import scala.util.{Try, Success, Failure} scala> val toDate = udf{(out:String, form:

Re: whether a certain piece can be assigned to a specicified node by some codes in my program.

2016-03-26 Thread Ted Yu

Please take a look at the following method: /** * Get the preferred locations of a partition, taking into account whether the * RDD is checkpointed. */ final def preferredLocations(split: Partition): Seq[String] = { checkpointRDD.map(_.getPreferredLocations(split)).getOrElse {

whether a certain piece can be assigned to a specicified node by some codes in my program.

2016-03-26 Thread chenyong

I am newbie to the greate spark framework. After reading some meterials about spark, I know that a RDD dataset are actually broken into pieces and distributed among serveral nodes. I am wondering whether a certain piece can be assigned to a specicified node by some codes in my program. Or

Re: Any plans to migrate Transformer API to Spark SQL (closer to DataFrames)?

2016-03-26 Thread Ted Yu

That's quite informative, Michal. Though I don't read the first few slides which are not in English. On Sat, Mar 26, 2016 at 6:12 AM, Michał Zieliński < zielinski.mich...@gmail.com> wrote: > Ted, > > Sure. This was presented by my colleague during Data Science London > meetup. The talk was

Re: Create one DB connection per executor

2016-03-26 Thread Manas

Thanks much Gerard & Manas for your inputs. I'll keep in mind the connection pooling part. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Create-one-DB-connection-per-executor-tp26588p26601.html Sent from the Apache Spark User List mailing list archive at

Re: Is this expected in Spark 1.6.1, derby.log file created when spark shell starts

2016-03-26 Thread Ted Yu

Same with master branch. I found derby.log in the following two files: .gitignore:derby.log dev/.rat-excludes:derby.log FYI On Sat, Mar 26, 2016 at 4:09 AM, Mich Talebzadeh wrote: > Having moved to Spark 1.6.1, I have noticed thar whenerver I start a > spark-sql or

Re: Limit pyspark.daemon threads

2016-03-26 Thread Carlile, Ken

Thanks, Sven! I know that I’ve messed up the memory allocation, but I’m trying not to think too much about that (because I’ve advertised it to my users as “90GB for Spark works!” and that’s how it displays in the Spark UI (totally ignoring the python processes). So I’ll need to deal

Fwd: Any plans to migrate Transformer API to Spark SQL (closer to DataFrames)?

2016-03-26 Thread Michał Zieliński

Ted, Sure. This was presented by my colleague during Data Science London meetup. The talk was about "Scalable Predictive Pipelines with Spark & Scala". Link to the meetup and slides below: http://www.meetup.com/Data-Science-London/events/229755935/

Is this expected in Spark 1.6.1, derby.log file created when spark shell starts

2016-03-26 Thread Mich Talebzadeh

Having moved to Spark 1.6.1, I have noticed thar whenerver I start a spark-sql or shell. a dervy.log file is created in the directory! cat derby.log Sat Mar 26 11:18:55 GMT 2016: Booting Derby version The Apache Software Foundation

Re: Finding out the time a table was created

2016-03-26 Thread Mich Talebzadeh

Hi Ted, I moved to Spark 1.6 Still the same issue outstanding Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.6.1 /_/ Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_25) Type

Re: is there any way to submit spark application from outside of spark cluster

2016-03-26 Thread Hyukjin Kwon

Hi, For RESTful API for submitting an application, please take a look at this link. http://arturmkrtchyan.com/apache-spark-hidden-rest-api On 26 Mar 2016 12:07 p.m., "vetal king" wrote: > Prateek > > It's possible to submit spark application from outside application. If

Re: This simple UDF is not working!

2016-03-26 Thread Dhaval Modi

Hi Mich, You can try this: val toDate = udf{(out:String, form: String) => { val format = new SimpleDateFormat(s"$form"); Try(new Date(format.parse(out.toString()).getTime))match { case Success(t) => Some(t) case Failure(_) => None }}}; Usage: src = src.withColumn(s"$columnName",

Re: Databricks fails to read the csv file with blank line at the file header

Databricks fails to read the csv file with blank line at the file header

Re: Limit pyspark.daemon threads

Re: Hive on Spark engine

Re: Hive on Spark engine

Re: Hive on Spark engine

Re: Hive on Spark engine

Hive on Spark engine

Re: Limit pyspark.daemon threads

Re: Limit pyspark.daemon threads

A problem involving Spark & HBase.

Fwd: This simple UDF is not working!

Re: whether a certain piece can be assigned to a specicified node by some codes in my program.

whether a certain piece can be assigned to a specicified node by some codes in my program.

Re: Any plans to migrate Transformer API to Spark SQL (closer to DataFrames)?

Re: Create one DB connection per executor

Re: Is this expected in Spark 1.6.1, derby.log file created when spark shell starts

Re: Limit pyspark.daemon threads

Fwd: Any plans to migrate Transformer API to Spark SQL (closer to DataFrames)?

Is this expected in Spark 1.6.1, derby.log file created when spark shell starts

Re: Finding out the time a table was created

Re: is there any way to submit spark application from outside of spark cluster

Re: This simple UDF is not working!

23 matches

Site Navigation

Mail list logo

Footer information