Not able to receive data in spark from rsyslog

2015-12-03 Thread masoom alam
I am getting am error that I am not able receive data in spark streaming application from spark.please help with any pointers. 9 - java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at

Re: How to test https://issues.apache.org/jira/browse/SPARK-10648 fix

2015-12-03 Thread Madabhattula Rajesh Kumar
Hi JB and Ted, Thank you very much for the steps Regards, Rajesh On Thu, Dec 3, 2015 at 8:16 PM, Ted Yu wrote: > See this thread for Spark 1.6.0 RC1 > > > http://search-hadoop.com/m/q3RTtKdUViYHH1b1=+VOTE+Release+Apache+Spark+1+6+0+RC1+ > > Cheers > > On Thu, Dec 3, 2015

Re: Python API Documentation Mismatch

2015-12-03 Thread Yanbo Liang
Hi Roberto, There are two ALS available: ml.recommendation.ALS and mllib.recommendation.ALS .

Re: Re: spark master - run-tests error

2015-12-03 Thread wei....@kaiyuandao.com
it works like a charm for me. thanks for the quick workaround From: Ted Yu Date: 2015-12-04 10:45 To: wei@kaiyuandao.com CC: user Subject: Re: spark master - run-tests error From dev/run-tests.py : def identify_changed_files_from_git_commits(patch_sha, target_branch=None,

Re: newbie best practices: is spark-ec2 intended to be used to manage long-lasting infrastructure ?

2015-12-03 Thread Divya Gehlot
Hello, Even I have the same queries in mind . What all the upgrades where we can use EC2 as compare to normal servers for spark and other big data product development . Hope to get inputs from the community . Thanks, Divya On Dec 4, 2015 6:05 AM, "Andy Davidson"

Re: Re: spark sql cli query results written to file ?

2015-12-03 Thread fightf...@163.com
Well , Sorry for late reponse and thanks a lot for pointing out the clue. fightf...@163.com From: Akhil Das Date: 2015-12-03 14:50 To: Sahil Sareen CC: fightf...@163.com; user Subject: Re: spark sql cli query results written to file ? Oops 3 mins late. :) Thanks Best Regards On Thu, Dec

How to test https://issues.apache.org/jira/browse/SPARK-10648 fix

2015-12-03 Thread Madabhattula Rajesh Kumar
Hi Team, Looks like this issue is fixed in 1.6 release. How to test this fix? Is any jar is available? So I can add that jar in dependency and test this fix. (Or) Any other way, I can test this fix in 1.15.2 code base. Could you please let me know the steps. Thank you for your support Regards,

Column Aliases are Ignored in callUDF while using struct()

2015-12-03 Thread Sachin Aggarwal
Hi All, need help guys, I need a work around for this situation *case where this works:* val TestDoc1 = sqlContext.createDataFrame(Seq(("sachin aggarwal", "1"), ("Rishabh", "2"))).toDF("myText", "id")

Re: Column Aliases are Ignored in callUDF while using struct()

2015-12-03 Thread Sahil Sareen
Attaching the JIRA as well for completeness: https://issues.apache.org/jira/browse/SPARK-12117 On Thu, Dec 3, 2015 at 4:13 PM, Sachin Aggarwal wrote: > > Hi All, > > need help guys, I need a work around for this situation > > *case where this works:* > > val TestDoc1

How and where to update release notes for spark rel 1.6?

2015-12-03 Thread RaviShankar KS
Hi, How and where to update release notes for spark rel 1.6? pls help. There are a few methods with changed params, and a few deprecated ones that need to be documented. Thanks Ravi

Re: Spark Streaming from S3

2015-12-03 Thread Steve Loughran
On 3 Dec 2015, at 00:42, Michele Freschi > wrote: Hi all, I have an app streaming from s3 (textFileStream) and recently I've observed increasing delay and long time to list files: INFO dstream.FileInputDStream: Finding new files took 394160

Re: Building spark 1.3 from source code to work with Hive 1.2.1

2015-12-03 Thread zhangjp
I have encountered the same issues. before I changed the spark version i setted up environment as follows. spark 1.5.2 hadoop 2.6.2 hive 1.2.1 but no luck it's not work well, even through i run essembly hive in spark with jdbc mode there is also some properblems. then I changed the spark

Python API Documentation Mismatch

2015-12-03 Thread Roberto Pagliari
Hello, I believe there is a mismatch between the API documentation (1.5.2) and the software currently available. Not all functions mentioned here http://spark.apache.org/docs/latest/api/python/pyspark.ml.html#module-pyspark.ml.recommendation are, in fact available. For example, the code below

Re: Checkpointing not removing shuffle files from local disk

2015-12-03 Thread Ewan Higgs
Hi all, We are running a class with Pyspark notebook for data analysis. Some of the books are fairly long and have a lot of operations. Through the course of the notebook, the shuffle storage expands considerably and often exceeds quota (e.g. 1.5GB input expands to 24GB in shuffle files). Closing

RE: spark1.4.1 extremely slow for take(1) or head() or first() or show

2015-12-03 Thread Mich Talebzadeh
Can you try running it directly on hive to see the timing or through spark-sql may be. Spark does what Hive does that is processing large sets of data, but it attempts to do the intermediate iterations in memory if it can (i.e. if there is enough memory available to keep the data set in

Re: Can Spark Execute Hive Update/Delete operations

2015-12-03 Thread 张炜
Hi all, Sorry the referenced link is not using a private/own branch of hive. It's using Hortonworks 2.3 and the hive packaged in HDP2.3, and installed a standalone version of Spark cluster (1.5.2) But the Hive on Spark cannot run. Could anyone help on this? Thanks a lot! Regards, Sai On Wed,

spark1.4.1 extremely slow for take(1) or head() or first() or show

2015-12-03 Thread hxw黄祥为
Dear All, I have a hive table with 100 million data and I just ran some very simple operations on this dataset like: val df = sqlContext.sql("select * from user ").toDF df.cache df.registerTempTable("tb") val b=sqlContext.sql("select

Re: spark1.4.1 extremely slow for take(1) or head() or first() or show

2015-12-03 Thread Sahil Sareen
"select 'uid',max(length(uid)),count(distinct(uid)),count(uid),sum(case when uid is null then 0 else 1 end),sum(case when uid is null then 1 else 0 end),sum(case when uid is null then 1 else 0 end)/count(uid) from tb" Is this as is, or did you use a UDF here? -Sahil On Thu, Dec 3, 2015 at 4:06

Re: LDA topic modeling and Spark

2015-12-03 Thread Robin East
What exactly is this probability distribution? For each word in your vocabulary it is the probability that a randomly drawn word from a topic is that word. Another way to visualise it is a 2-column vector where the 1st column is a word in your vocabulary and the 2nd column is the probability of

Re: How to test https://issues.apache.org/jira/browse/SPARK-10648 fix

2015-12-03 Thread Jean-Baptiste Onofré
Hi Rajesh, you can check codebase and build yourself in order to test: git clone https://git-wip-us.apache.org/repos/asf/spark cd spark mvn clean package -DskipTests You will have bin, sbin and conf folders to try it. Regards JB On 12/03/2015 09:39 AM, Madabhattula Rajesh Kumar wrote: Hi

Building spark 1.3 from source code to work with Hive 1.2.1

2015-12-03 Thread Mich Talebzadeh
Hi, I have seen mails that state that the user has managed to build spark 1.3 to work with Hive. I tried Spark 1.5.2 but no luck I downloaded spark source 1.3 source code spark-1.3.0.tar and built it as follows ./make-distribution.sh --name "hadoop2-without-hive" --tgz

One Problem about Spark Dynamic Allocation

2015-12-03 Thread 谢廷稳
Hi all, I ran spark 1.4 with Dynamic Allocation enabled, when it was running, I can see Executor's information, such as ID, Address, Shuffle Read/Write, logs etc.But once executor was removed, the web page not display that executor any more, finally, the spark app's information in Spark

Re: How and where to update release notes for spark rel 1.6?

2015-12-03 Thread Jean-Baptiste Onofré
Hi Ravi, Even if it's not perfect, you can take a look on the current ReleaseNotes on Jira: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315420=12333083 Regards JB On 12/03/2015 12:01 PM, RaviShankar KS wrote: Hi, How and where to update release notes for spark rel

Sparse Vector ArrayIndexOutOfBoundsException

2015-12-03 Thread nabegh
I'm trying to run a SVM classifier on unlabeled data. I followed this to build the vectors and checked this

Re: How to test https://issues.apache.org/jira/browse/SPARK-10648 fix

2015-12-03 Thread Ted Yu
See this thread for Spark 1.6.0 RC1 http://search-hadoop.com/m/q3RTtKdUViYHH1b1=+VOTE+Release+Apache+Spark+1+6+0+RC1+ Cheers On Thu, Dec 3, 2015 at 12:39 AM, Madabhattula Rajesh Kumar < mrajaf...@gmail.com> wrote: > Hi Team, > > Looks like this issue is fixed in 1.6 release. How to test this

Re: Multiplication on decimals in a dataframe query

2015-12-03 Thread Philip Dodds
I'll open up a JIRA for it, it appears to work when you use a literal number but not when it is coming from the same dataframe Thanks! P On Thu, Dec 3, 2015 at 1:52 AM, Sahil Sareen wrote: > +1 looks like a bug > > I think referencing trades() twice in multiplication is

Re: Multiplication on decimals in a dataframe query

2015-12-03 Thread Philip Dodds
Opened https://issues.apache.org/jira/browse/SPARK-12128 Thanks P On Thu, Dec 3, 2015 at 8:51 AM, Philip Dodds wrote: > I'll open up a JIRA for it, it appears to work when you use a literal > number but not when it is coming from the same dataframe > > Thanks! > > P >

Why does Spark job stucks and waits for only last tasks to get finished

2015-12-03 Thread unk1102
Hi I have Spark job where I keep queue of 12 Spark jobs to execute in parallel. Now I see job is almost completed and only task is pending and because of last task job will keep on waiting I can see in UI. Please see attached snaps. Please help me how to resolve Spark jobs from waiting for last

Local mode: Stages hang for minutes

2015-12-03 Thread Richard Marscher
Hi, I'm doing some testing of workloads using local mode on a server. I see weird behavior where a job is submitted to the application and it just hangs for several minutes doing nothing. The stages are submitted as pending and in the application UI the stage view claims no tasks have been

Problem with RDD of (Long, Byte[Array])

2015-12-03 Thread Hervé Yviquel
Hi all, I have problem when using Array[Byte] in RDD operation. When I join two different RDDs of type [(Long, Array[Byte])], I obtain wrong results... But if I translate the byte array in integer and join two different RDDs of type [(Long, Integer)], then the results is correct... Any idea ?

AWS CLI --jars comma problem

2015-12-03 Thread Yusuf Can Gürkan
Hello I have a question about AWS CLI for people who use it. I create a spark cluster with aws cli and i’m using spark step with jar dependencies. But as you can see below i can not set multiple jars because AWS CLI replaces comma with space in ARGS. Is there a way of doing it? I can accept

Re: Python API Documentation Mismatch

2015-12-03 Thread Felix Cheung
Please open an issue in JIRA, thanks! On Thu, Dec 3, 2015 at 3:03 AM -0800, "Roberto Pagliari" wrote: Hello, I believe there is a mismatch between the API documentation (1.5.2) and the software currently available. Not all functions mentioned here

Re: Local mode: Stages hang for minutes

2015-12-03 Thread Richard Marscher
I should add that the pauses are not from GC and also in tracing the CPU call tree in the JVM it seems like nothing is doing any work, just seems to be idling or blocking. On Thu, Dec 3, 2015 at 11:24 AM, Richard Marscher wrote: > Hi, > > I'm doing some testing of

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-03 Thread Cody Koeninger
Do you believe that all exceptions (including catastrophic ones like out of heap space) should be caught and silently discarded? Do you believe that a database system that runs out of disk space should silently continue to accept writes? What I am trying to say is, when something is broken in a

Re: How the cores are used in Directstream approach

2015-12-03 Thread Cody Koeninger
There's a 1:1 relationship between Kafka partitions and Spark partitions. Have you read https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md A direct stream job will use up to spark.executor.cores number of cores. If you have fewer partitions than cores, there probably won't be

understanding and disambiguating CPU-core related properties

2015-12-03 Thread Manolis Sifalakis1
I have found the documentation rather poor in helping me understand the interplay among the following properties in spark, even more how to set them. So this post is sent in hope for some discussion and "enlightenment" on the topic Let me start by asking if I have understood well the

Re: Multiplication on decimals in a dataframe query

2015-12-03 Thread Philip Dodds
Did a little more digging and it appears it was just the way I constructed the Decimal It works if you do val data = Seq.fill(5) { Trade(Decimal(BigDecimal(5),38,20), Decimal(BigDecimal(5),38,20)) } On Thu, Dec 3, 2015 at 8:58 AM, Philip Dodds wrote: > Opened

Spark Streaming BackPressure and Custom Receivers

2015-12-03 Thread Deenar Toraskar
Hi I was going through the Spark Streaming BackPressure feature documentation and wanted to understand how I can ensure my custom receiver is able to handle rate limiting. I have a custom receiver similar to the TwitterInputDStream, but there is no obvious way to throttle what is being read from

Re: Does Spark streaming support iterative operator?

2015-12-03 Thread Sean Owen
Yes, in the sense that you can create and trigger an action on as many RDDs created from the batch's RDD that you like. On Thu, Dec 3, 2015 at 8:04 PM, Wang Yangjun wrote: > Hi, > > In storm we could do thing like: > > TopologyBuilder builder = new TopologyBuilder(); > >

Does Spark streaming support iterative operator?

2015-12-03 Thread Wang Yangjun
Hi, In storm we could do thing like: TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("spout", new NumberSpout()); builder.setBolt(“mybolt", new Mybolt()) .shuffleGrouping("spout") .shuffleGrouping(“mybolt", “iterativeStream"); It means that after one operation

Re: Spark Streaming from S3

2015-12-03 Thread Michele Freschi
Hi Steve, I¹m on hadoop 2.7.1 using the s3n From: Steve Loughran Date: Thursday, December 3, 2015 at 4:12 AM Cc: SPARK-USERS Subject: Re: Spark Streaming from S3 > On 3 Dec 2015, at 00:42, Michele Freschi wrote: > >

RE: Any clue on this error, Exception in thread "main" java.lang.NoSuchFieldError: SPARK_RPC_CLIENT_CONNECT_TIMEOUT

2015-12-03 Thread Mich Talebzadeh
Hi Marcelo. So this is the approach I am going to take: Use spark 1.3 pre-built Use Hive 1.2.1. Do not copy over anything to add to hive libraries from spark 1.3 libraries Use Hadoop 2.6 There is no need to mess around with the libraries. I will try to unset my CLASSPATH and reset again and

Spark Streaming Running Out Of Memory in 1.5.0.

2015-12-03 Thread Augustus Hong
Hi All, I'm running Spark Streaming (Python) with Direct Kafka and I'm seeing that the memory usage will slowly go up and eventually kill the job in a few days. Everything runs fine at first but after a few days the job started issuing *error: [Errno 104] Connection reset by peer , *followed by

Re: Does Spark streaming support iterative operator?

2015-12-03 Thread Wang Yangjun
Hi, Thanks for your quick reply. Could you provide some pseudocode? It is a little hard to understand. Thanks Jun On 03/12/15 22:16, "Sean Owen" wrote: >Yes, in the sense that you can create and trigger an action on as many >RDDs created from the batch's RDD that you

Re: Kafka - streaming from multiple topics

2015-12-03 Thread Cody Koeninger
Yeah, that general plan should work, but might be a little awkward for adding topicPartitions after the fact (i.e. when you have stored offsets for some, but not all, of your topicpartitions) Personally I just query kafka for the starting offsets if they dont exist in the DB, using the methods in

SparkR in Spark 1.5.2 jsonFile Bug Found

2015-12-03 Thread tomasr3
Hello, I believe to have encountered a bug with Spark 1.5.2. I am using RStudio and SparkR to read in JSON files with jsonFile(sqlContext, "path"). If "path" is a single path (e.g., "/path/to/dir0"), then it works fine; but, when "path" is a vector of paths (e.g. path <-

Spark SQL - Reading HCatalog Table

2015-12-03 Thread Sandip Mehta
Hi All, I have a table created in Hive and stored/read using HCatalog. Table is in ORC format. I want to read this table in Spark SQL and do the join with RDDs. How can i connect to HCatalog and get data from Spark SQL? SM

consumergroup not working

2015-12-03 Thread Hudong Wang
Hi, I am trying to read data from kafka in zookeeper mode with following code. val kafkaParams = Map[String, String] ( "zookeeper.connect" -> zookeeper, "metadata.broker.list" -> brokers, "group.id" -> consumerGroup, "auto.offset.reset" -> autoOffsetReset) return

newbie best practices: is spark-ec2 intended to be used to manage long-lasting infrastructure ?

2015-12-03 Thread Andy Davidson
About 2 months ago I used spark-ec2 to set up a small cluster. The cluster runs a spark streaming app 7x24 and stores the data to hdfs. I also need to run some batch analytics on the data. Now that I have a little more experience I wonder if this was a good way to set up the cluster the following

Re: How and where to update release notes for spark rel 1.6?

2015-12-03 Thread Andy Davidson
Hi JB Do you know where I can find instructions for upgrading an existing installation? I search the link you provided for ³update² and ³upgrade² Kind regards Andy From: Jean-Baptiste Onofré Date: Thursday, December 3, 2015 at 5:29 AM To: "user @spark"

Re: Local mode: Stages hang for minutes

2015-12-03 Thread Richard Marscher
Ended up realizing I was only looking at the call tree for running threads. After looking at blocking threads I saw that it was spending hundreds of compute hours blocking on jets3t calls to S3. Realized it was looking over likely thousands if not hundreds of thousands of S3 files accumulated over

Spark java.lang.SecurityException: class “javax.servlet.FilterRegistration”' with sbt

2015-12-03 Thread Moises Baly
Hi all, I'm having issues with javax.servlet when running simple Spark jobs. I'm using Scala + sbt and found a solution for this error: problem is, this particular solution is not working when running tests. Any idea how can I exclude all conflicted dependencies for all scopes? Here is my partial

Re: Kafka - streaming from multiple topics

2015-12-03 Thread Dan Dutrow
Hey Cody, I'm convinced that I'm not going to get the functionality I want without using the Direct Stream API. I'm now looking through https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md#exactly-once-using-transactional-writes where you say "For the very first time the job is

Re: Spark Streaming Running Out Of Memory in 1.5.0.

2015-12-03 Thread Ted Yu
bq. lambda part: save_sets(part, KEY_SET_NAME, Where do you save the part to ? For OutOfMemoryError, the last line was from Utility.scala Anything before that ? Thanks On Thu, Dec 3, 2015 at 11:47 AM, Augustus Hong wrote: > Hi All, > > I'm running Spark Streaming

sparkavro for PySpark 1.3

2015-12-03 Thread YaoPau
How can I read from and write to Avro using PySpark in 1.3? I can only find the 1.4 documentation , which uses a sqlContext.read method that isn't available to me in 1.3. -- View this message in context:

Re: Spark Streaming Specify Kafka Partition

2015-12-03 Thread Alan Braithwaite
One quick newbie question since I got another chance to look at this today. We're using java for our spark applications. The createDirectStream we were using previously [1] returns a JavaPairInputDStream, but the createDirectStream with fromOffsets expects an argument recordClass to pass into

Re: Problem with RDD of (Long, Byte[Array])

2015-12-03 Thread Josh Rosen
Are they keys that you're joining on the bye arrays themselves? If so, that's not likely to work because of how Java computes arrays' hashCodes; see https://issues.apache.org/jira/browse/SPARK-597. If this turns out to be the problem, we should look into strengthening the checks for array-type

Re: Any clue on this error, Exception in thread "main" java.lang.NoSuchFieldError: SPARK_RPC_CLIENT_CONNECT_TIMEOUT

2015-12-03 Thread Marcelo Vanzin
(bcc: user@spark, since this is Hive code.) You're probably including unneeded Spark jars in Hive's classpath somehow. Either the whole assembly or spark-hive, both of which will contain Hive classes, and in this case contain old versions that conflict with the version of Hive you're running. On

RE: Any clue on this error, Exception in thread "main" java.lang.NoSuchFieldError: SPARK_RPC_CLIENT_CONNECT_TIMEOUT

2015-12-03 Thread Mich Talebzadeh
Thanks I tried all :( I am trying to make Hive use Spark and apparently Hive can use version 1.3 of Spark as execution engine. Frankly I don’t know why this is not working! Mich Talebzadeh Sybase ASE 15 Gold Medal Award 2008 A Winning Strategy: Running the most Critical Financial

how to spark streaming application start working on next batch before completing on previous batch .

2015-12-03 Thread prateek arora
Hi I am using spark streaming with Kafka. spark version is 1.5.0 and batch interval is 1 sec. In my scenario , algorithm take 7-10 sec to process 1 batch period data. so after completing previous batch , spark streaming start processing on next batch. i want that my spark streaming

jdbc error, ClassNotFoundException: org.apache.hadoop.hive.schshim.FairSchedulerShim

2015-12-03 Thread zhangjp
Hi all, I download the prebuilt version 1.5.2 with hadoop 2.6, when I use spark-sql there is no problem, but when i start thriftServer and then want to query hive table useing jdbc there is errors as follows. Caused by: java.lang.ClassNotFoundException:

Re: SparkR in Spark 1.5.2 jsonFile Bug Found

2015-12-03 Thread Felix Cheung
It looks like this has been broken around Spark 1.5. Please see JIRA SPARK-10185. This has been fixed in pyspark but unfortunately SparkR was missed. I have confirmed this is still broken in Spark 1.6. Could you please open a JIRA? On Thu, Dec 3, 2015 at 2:08 PM -0800, "tomasr3"

spark master - run-tests error

2015-12-03 Thread wei....@kaiyuandao.com
hi, is there anyone knowing why I came to the following error when running tests after a successful full build? thanks [root@sandbox spark_git]# dev/run-tests ** File "./dev/run-tests.py", line 68, in

Re: spark master - run-tests error

2015-12-03 Thread Ted Yu
The commit on last line led to: commit 50a0496a43f09d70593419efc38587c8441843bf Author: Brennon York Date: Wed Jun 17 12:00:34 2015 -0700 When did you last update your workspace ? Cheers On Thu, Dec 3, 2015 at 6:09 PM, wei@kaiyuandao.com <

Re: Re: spark master - run-tests error

2015-12-03 Thread wei....@kaiyuandao.com
I was using the latest master branch. From: Ted Yu Date: 2015-12-04 10:14 To: wei@kaiyuandao.com CC: user Subject: Re: spark master - run-tests error The commit on last line led to: commit 50a0496a43f09d70593419efc38587c8441843bf Author: Brennon York Date:

Re: spark master - run-tests error

2015-12-03 Thread Ted Yu
>From dev/run-tests.py : def identify_changed_files_from_git_commits(patch_sha, target_branch=None, target_ref=None): """ Given a git commit and target ref, use the set of files changed in the diff in order to determine which modules' tests should be run. Looks like the script needs

Spark Streaming Shuffle to Disk

2015-12-03 Thread Steven Pearson
I'm running a Spark Streaming job on 1.3.1 which contains an updateStateByKey. The job works perfectly fine, but at some point (after a few runs), it starts shuffling to disk no matter how much memory I give the executors. I have tried changing --executor-memory on spark-submit,

Creating a dataframe with decimals changes the precision and scale

2015-12-03 Thread Philip Dodds
I'm not sure if there is a way around this just looking for advice, I create a dataframe from some decimals with a specific precision and scale, then when I look at the dataframe it has defaulted the precision and scale back again. Is there a way to retain the precision and scale when doing a

Re: Column Aliases are Ignored in callUDF while using struct()

2015-12-03 Thread Sachin Aggarwal
Hi, has anyone faced this error, is there any workaround to this issue? thanks On Thu, Dec 3, 2015 at 4:28 PM, Sahil Sareen wrote: > Attaching the JIRA as well for completeness: > https://issues.apache.org/jira/browse/SPARK-12117 > > On Thu, Dec 3, 2015 at 4:13 PM, Sachin

Re: Problem with RDD of (Long, Byte[Array])

2015-12-03 Thread Hervé Yviquel
Hi Josh, Thanks for the answer. No, in my case, the byte arrays are the values... I use indexes generated by zipWithIndex as the keys (I inverse the RDD to put them in the front). However, if I clone the bytearrays before joining the RDDs, it seems to fix my problem (but I'm not sure why) --

Re: Creating a dataframe with decimals changes the precision and scale

2015-12-03 Thread Ted Yu
Looks like what you observed is due to the following code in Decimal.scala : def set(decimal: BigDecimal, precision: Int, scale: Int): Decimal = { this.decimalVal = decimal.setScale(scale, ROUND_HALF_UP) require( decimalVal.precision <= precision, s"Decimal precision

Any clue on this error, Exception in thread "main" java.lang.NoSuchFieldError: SPARK_RPC_CLIENT_CONNECT_TIMEOUT

2015-12-03 Thread Mich Talebzadeh
Trying to run Hive on Spark 1.3 engine, I get conf hive.spark.client.channel.log.level=null --conf hive.spark.client.rpc.max.size=52428800 --conf hive.spark.client.rpc.threads=8 --conf hive.spark.client.secret.bits=256 15/12/03 17:53:18 [stderr-redir-1]: INFO client.SparkClientImpl: Spark

Re: Local mode: Stages hang for minutes

2015-12-03 Thread Ali Tajeldin EDU
You can try to run "jstack" a couple of times while the app is hung to look for patterns for where the app is hung. -- Ali On Dec 3, 2015, at 8:27 AM, Richard Marscher wrote: > I should add that the pauses are not from GC and also in tracing the CPU call > tree in

Re: Any clue on this error, Exception in thread "main" java.lang.NoSuchFieldError: SPARK_RPC_CLIENT_CONNECT_TIMEOUT

2015-12-03 Thread Marcelo Vanzin
On Thu, Dec 3, 2015 at 10:32 AM, Mich Talebzadeh wrote: > hduser@rhes564::/usr/lib/spark/logs> hive --version > SLF4J: Found binding in > [jar:file:/usr/lib/spark/lib/spark-assembly-1.3.0-hadoop2.4.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] As I suggested before, you