RE: query avro hive table in spark sql

2015-08-27 Thread java8964
What version of the Hive you are using? And do you compile to the right version of Hive when you compiled Spark? BTY, spark-avro works great for our experience, but still, some non-tech people just want to use as a SQL shell in spark, like HIVE-CLI. Yong From: mich...@databricks.com Date: Wed,

Re: spark streaming 1.3 kafka topic error

2015-08-27 Thread Ahmed Nawar
Dears, I needs to commit DB Transaction for each partition,Not for each row. below didn't work for me. rdd.mapPartitions(partitionOfRecords = { DBConnectionInit() val results = partitionOfRecords.map(..) DBConnection.commit() }) Best regards, Ahmed Atef Nawwar Data Management

Best way to filter null on any column?

2015-08-27 Thread Saif.A.Ellafi
Hi all, What would be a good way to filter rows in a dataframe, where any value in a row has null? I wouldn't want to go through each column manually. Thanks, Saif

Adding Kafka topics to a running streaming context

2015-08-27 Thread yael aharon
Hello, My streaming application needs to allow consuming new Kafka topics at arbitrary times. I know I can stop and start the streaming context when I need to introduce a new stream, but that seems quite disruptive. I am wondering if other people have this situation and if there is a more elegant

Re:Adding Kafka topics to a running streaming context

2015-08-27 Thread Sudarshan Kadambi (BLOOMBERG/ 731 LEX)
This is something we have been needing for a while too. We are restarting the streaming context to handle new topic subscriptions unsubscriptions which affects latency of update handling. I think this is something that needs to be addressed in core Spark Streaming (I can't think of any

Re: Adding Kafka topics to a running streaming context

2015-08-27 Thread Cody Koeninger
If you all can say a little more about what your requirements are, maybe we can get a jira together. I think the easiest way to deal with this currently is to start a new job before stopping the old one, which should prevent latency problems. On Thu, Aug 27, 2015 at 9:24 AM, Sudarshan Kadambi

Re: spark streaming 1.3 kafka topic error

2015-08-27 Thread Cody Koeninger
Map is lazy. You need an actual action, or nothing will happen. Use foreachPartition, or do an empty foreach after the map. On Thu, Aug 27, 2015 at 8:53 AM, Ahmed Nawar ahmed.na...@gmail.com wrote: Dears, I needs to commit DB Transaction for each partition,Not for each row. below

spark-submit issue

2015-08-27 Thread pranay
I have a java program that does this - (using Spark 1.3.1 ) Create a command string that uses spark-submit in it ( with my Class file etc ), and i store this string in a temp file somewhere as a shell script Using Runtime.exec, i execute this script and wait for its completion, using

RE: Help! Stuck using withColumn

2015-08-27 Thread Saif.A.Ellafi
Hello, thank you for the response. I found a blog where a guy explains that it is not possible to join columns from different data frames. I was trying to modify one column’s information, so selecting it and then trying to replace the original dataframe column. Found another way, Thanks Saif

Re: Driver running out of memory - caused by many tasks?

2015-08-27 Thread Ashish Rangole
I suggest taking a heap dump of driver process using jmap. Then open that dump in a tool like Visual VM to see which object(s) are taking up heap space. It is easy to do. We did this and found out that in our case it was the data structure that stores info about stages, jobs and tasks. There can

sbt error -- before Terasort compilation

2015-08-27 Thread Shreeharsha G Neelakantachar
Hi , I am getting this sbt error when trying to compile a terasort sbt file with following contents, but facinng below error. Please suggest for what can be tried.. (Terasort is written in scala for our Benchmark) root@:~# more teraSort/tera.sbt name := IBM ARL TeraSort version := 1.0

Re: spark streaming 1.3 kafka buffer size

2015-08-27 Thread Cody Koeninger
As it stands currently, no. If you're already overriding the dstream, it would be pretty straightforward to change the kafka parameters used when creating the rdd for the next batch though On Wed, Aug 26, 2015 at 11:41 PM, Shushant Arora shushantaror...@gmail.com wrote: Can I change this param

commit DB Transaction for each partition

2015-08-27 Thread Ahmed Nawar
Dears, I needs to commit DB Transaction for each partition,Not for each row. below didn't work for me. rdd.mapPartitions(partitionOfRecords = { DBConnectionInit() val results = partitionOfRecords.map(..) DBConnection.commit() results }) Best regards, Ahmed Atef Nawwar Data

Spark Streaming Listener to Kill Stages?

2015-08-27 Thread suchenzang
Hello, I was wondering if there's a way to kill stages in spark streaming via a StreamingListener. Sometimes stages will hang, and simply killing it via the UI: http://apache-spark-user-list.1001560.n3.nabble.com/file/n24478/Screen_Shot_2015-08-27_at_11.png is enough to let the streaming job

Re: Writing test case for spark streaming checkpointing

2015-08-27 Thread Cody Koeninger
Kill the job in the middle of a batch, look at the worker logs to see which offsets were being processed, verify the messages for those offsets are read when you start the job back up On Thu, Aug 27, 2015 at 10:14 AM, Hafiz Mujadid hafizmujadi...@gmail.com wrote: Hi! I have enables check

Re: query avro hive table in spark sql

2015-08-27 Thread Giri P
I was using different build of spark compiled with different version of hive before I error which I see now org.apache.hadoop.hive.serde2.avro.BadSchemaException at org.apache.hadoop.hive.serde2.avro.AvroSerDe.deserialize(AvroSerDe.java:195) at

Getting number of physical machines in Spark

2015-08-27 Thread Young, Matthew T
What's the canonical way to find out the number of physical machines in a cluster at runtime in Spark? I believe SparkContext.defaultParallelism will give me the number of cores, but I'm interested in the number of NICs. I'm writing a Spark streaming application to ingest from Kafka with the

Re: Adding Kafka topics to a running streaming context

2015-08-27 Thread Sudarshan Kadambi (BLOOMBERG/ 731 LEX)
I put up https://issues.apache.org/jira/browse/SPARK-10320. Let's continue this conversation there. From: c...@koeninger.org At: Aug 27 2015 10:30:12 To: Sudarshan Kadambi (BLOOMBERG/ 731 LEX) Cc: user@spark.apache.org Subject: Re: Adding Kafka topics to a running streaming context If you all

RE: query avro hive table in spark sql

2015-08-27 Thread java8964
You can run hive query in the spark-avro, but you cannot query the hive view in the spark-avro, as the view is stored in the Hive metadata. What do you mean the right version of spark, then can't determine table schema problem is fixed? I faced this problem before, and my guess is the Hive

Re: sbt error -- before Terasort compilation

2015-08-27 Thread Jacek Laskowski
On Thu, Aug 27, 2015 at 2:59 PM, Shreeharsha G Neelakantachar shreeharsh...@in.ibm.com wrote: root@:~# sbt package Getting org.scala-sbt sbt 0.13.9 ... :: problems summary :: WARNINGS module not found: org.scala-sbt#sbt;0.13.9 ... ERRORS Server access

Spark driver locality

2015-08-27 Thread Swapnil Shinde
Hello I am new to spark world and started to explore recently in standalone mode. It would be great if I get clarifications on below doubts- 1. Driver locality - It is mentioned in documentation that client deploy-mode is not good if machine running spark-submit is not co-located with worker

Re: query avro hive table in spark sql

2015-08-27 Thread Giri P
can we run hive queries using spark-avro ? In our case its not just reading the avro file. we have view in hive which is based on multiple tables. On Thu, Aug 27, 2015 at 9:41 AM, Giri P gpatc...@gmail.com wrote: we are using hive1.1 . I was able to fix below error when I used right version

Writing test case for spark streaming checkpointing

2015-08-27 Thread Hafiz Mujadid
Hi! I have enables check pointing in spark streaming with kafka. I can see that spark streaming is checkpointing to the mentioned directory at hdfs. How can i test that it works fine and recover with no data loss ? Thanks -- View this message in context:

Is there a way to store RDD and load it with its original format?

2015-08-27 Thread Saif.A.Ellafi
Hi, Any way to store/load RDDs keeping their original object instead of string? I am having trouble with parquet (there is always some error at class conversion), and don't use hadoop. Looking for alternatives. Thanks in advance Saif

Re: Driver running out of memory - caused by many tasks?

2015-08-27 Thread andrew.rowson
Thanks for this tip. I ran it in yarn-client mode with driver-memory = 4G and took a dump once the heap got close to 4G. num#instances #bytes class name -- 1: 446169 3661137256 [J 2: 2032795 222636720

Re: query avro hive table in spark sql

2015-08-27 Thread Giri P
we are using hive1.1 . I was able to fix below error when I used right version spark 15/08/26 17:51:12 WARN avro.AvroSerdeUtils: Encountered AvroSerdeException determining schema. Returning signal schema to indicate problem org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Neither

Re: application logs for long lived job on YARN

2015-08-27 Thread Chen Song
Anyone has similar problem or thoughts on this? On Wed, Aug 26, 2015 at 10:37 AM, Chen Song chen.song...@gmail.com wrote: When running long-lived job on YARN like Spark Streaming, I found that container logs gone after days on executor nodes, although the job itself is still running. I am

Commit DB Transaction for each partition

2015-08-27 Thread Ahmed Nawar
Thanks for foreach idea. But once i used it i got empty rdd. I think because results is an iterator. Yes i know Map is lazy but i expected there is solution to force action. I can not use foreachPartition because i need reuse the new RDD after some maps. On Thu, Aug 27, 2015 at 5:11 PM, Cody

Re: query avro hive table in spark sql

2015-08-27 Thread Michael Armbrust
BTY, spark-avro works great for our experience, but still, some non-tech people just want to use as a SQL shell in spark, like HIVE-CLI. To clarify: you can still use the spark-avro library with pure SQL. Just use the CREATE TABLE ... USING com.databricks.spark.avro OPTIONS (path '...')

Re: Spark driver locality

2015-08-27 Thread Rishitesh Mishra
Hi Swapnil, Let me try to answer some of the questions. Answers inline. Hope it helps. On Thursday, August 27, 2015, Swapnil Shinde swapnilushi...@gmail.com wrote: Hello I am new to spark world and started to explore recently in standalone mode. It would be great if I get clarifications on

Re: Commit DB Transaction for each partition

2015-08-27 Thread Cody Koeninger
You need to return an iterator from the closure you provide to mapPartitions On Thu, Aug 27, 2015 at 1:42 PM, Ahmed Nawar ahmed.na...@gmail.com wrote: Thanks for foreach idea. But once i used it i got empty rdd. I think because results is an iterator. Yes i know Map is lazy but i expected

Re: Commit DB Transaction for each partition

2015-08-27 Thread Ahmed Nawar
Yes, of course, I am doing that. But once i added results.foreach(row= {}) i pot empty RDD. rdd.mapPartitions(partitionOfRecords = { DBConnectionInit() val results = partitionOfRecords.map(..) DBConnection.commit() results.foreach(row= {}) results }) On Thu, Aug 27, 2015 at 10:18

Re: Commit DB Transaction for each partition

2015-08-27 Thread Cody Koeninger
This job contains a spark output action, and is what I originally meant: rdd.mapPartitions { result }.foreach { } This job is just a transformation, and won't do anything unless you have another output action. Not to mention, it will exhaust the iterator, as you noticed: rdd.mapPartitions

Re: sbt error -- before Terasort compilation

2015-08-27 Thread Jacek Laskowski
On Thu, Aug 27, 2015 at 5:40 PM, Jacek Laskowski ja...@japila.pl wrote: Server access Error: java.lang.RuntimeException: Unexpected error: java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty

Data Frame support CSV or excel format ?

2015-08-27 Thread spark user
Hi all , Can we create data frame from excels sheet or csv file , in below example It seems they support only json ? DataFrame df = sqlContext.read().json(examples/src/main/resources/people.json);

Re: Data Frame support CSV or excel format ?

2015-08-27 Thread Michael Armbrust
Check out spark-csv: http://spark-packages.org/package/databricks/spark-csv On Thu, Aug 27, 2015 at 11:48 AM, spark user spark_u...@yahoo.com.invalid wrote: Hi all , Can we create data frame from excels sheet or csv file , in below example It seems they support only json ? DataFrame df =

Re: Spark driver locality

2015-08-27 Thread Swapnil Shinde
Thanks Rishitesh !! 1. I get that driver doesn't need to be on master but there is lot of communication between driver and cluster. That's why co-located gateway was recommended. How much is the impact of driver not being co-located with cluster? 4. How does hdfs split get assigned to worker node

Re: types allowed for saveasobjectfile?

2015-08-27 Thread Holden Karau
Yes, any java serializable object. Its important to note that since its saving serialized objects it is as brittle as java serialization when it comes to version changes, so if you can make your data fit in something like sequence files, parquet, avro, or similar it can be not only more space

Porting a multit-hreaded compute intensive job to spark

2015-08-27 Thread Utkarsh Sengar
I am working on code which uses executor service to parallelize tasks (think machine learning computations done over small dataset over and over again). My goal is to execute some code as fast as possible, multiple times and store the result somewhere (total executions will be on the order of 100M

types allowed for saveasobjectfile?

2015-08-27 Thread Arun Luthra
What types of RDD can saveAsObjectFile(path) handle? I tried a naive test with an RDD[Array[String]], but when I tried to read back the result with sc.objectFile(path).take(5).foreach(println), I got a non-promising output looking like: [Ljava.lang.String;@46123a [Ljava.lang.String;@76123b

Re: types allowed for saveasobjectfile?

2015-08-27 Thread Arun Luthra
Ah, yes, that did the trick. So more generally, can this handle any serializable object? On Thu, Aug 27, 2015 at 2:11 PM, Jonathan Coveney jcove...@gmail.com wrote: array[String] doesn't pretty print by default. Use .mkString(,) for example El jueves, 27 de agosto de 2015, Arun Luthra

tweet transformation ideas

2015-08-27 Thread Jesse F Chen
This is a question on general usage/best practice/best transformation method to use for a sentiment analysis on tweets... Input: Tweets (e.g, @xyz, sorry but this movie is poorly scripted http://t.co/uyser876;) - large data set, ie. 1 billion tweets Sentiment dictionary (e.g,

Re: Commit DB Transaction for each partition

2015-08-27 Thread Ahmed Nawar
Thanks a lot for your support. It is working now. I wrote it like below val newRDD = rdd.mapPartitions { partition = { val result = partition.map(.) result } } newRDD.foreach { } On Thu, Aug 27, 2015 at 10:34 PM, Cody Koeninger c...@koeninger.org wrote: This job contains a spark

TimeoutException on start-slave spark 1.4.0

2015-08-27 Thread Alexander Pivovarov
I see the following error time to time when try to start slaves on spark 1.4.0 [hadoop@ip-10-0-27-240 apps]$ pwd /mnt/var/log/apps [hadoop@ip-10-0-27-240 apps]$ cat spark-hadoop-org.apache.spark.deploy.worker.Worker-1-ip-10-0-27-240.ec2.internal.out Spark Command: /usr/java/latest/bin/java -cp

Re: types allowed for saveasobjectfile?

2015-08-27 Thread Jonathan Coveney
array[String] doesn't pretty print by default. Use .mkString(,) for example El jueves, 27 de agosto de 2015, Arun Luthra arun.lut...@gmail.com escribió: What types of RDD can saveAsObjectFile(path) handle? I tried a naive test with an RDD[Array[String]], but when I tried to read back the

Re: types allowed for saveasobjectfile?

2015-08-27 Thread Holden Karau
So println of any array of strings will look like that. The java.util.Arrays class has some options to print arrays nicely. On Thu, Aug 27, 2015 at 2:08 PM, Arun Luthra arun.lut...@gmail.com wrote: What types of RDD can saveAsObjectFile(path) handle? I tried a naive test with an

Array column stored as “.bag” in parquet file instead of “REPEATED INT64

2015-08-27 Thread Jim Green
Hi Team, Say I have a test.json file: {c1:[1,2,3]} I can create a parquet file like : var df = sqlContext.load(/tmp/test.json,json) var df_c = df.repartition(1) df_c.select(*).save(/tmp/testjson_spark,parquet”) The output parquet file’s schema is like: c1: OPTIONAL F:1 .bag:

Spark Taking too long on K-means clustering

2015-08-27 Thread masoom alam
HI every one, I am trying to run KDD data set - basically chapter 5 of the Advanced Analytics with Spark book. The data set is of 789MB, but Spark is taking some 3 to 4 hours. Is it normal behaviour.or some tuning is required. The server RAM is 32 GB, but we can only give 4 GB RAM on 64 bit

How to avoid shuffle errors for a large join ?

2015-08-27 Thread Thomas Dudziak
I'm getting errors like Removing executor with no recent heartbeats Missing an output location for shuffle errors for a large SparkSql join (1bn rows/2.5TB joined with 1bn rows/30GB) and I'm not sure how to configure the job to avoid them. The initial stage completes fine with some 30k tasks on

Re: Feedback: Feature request

2015-08-27 Thread Manish Amde
Hi James, It's a good idea. A JSON format is more convenient for visualization though a little inconvenient to read. How about toJson() method? It might make the mllib api inconsistent across models though. You should probably create a JIRA for this. CC: dev list -Manish On Aug 26, 2015,

Re: How to remove worker node but let it finish first?

2015-08-27 Thread Akhil Das
You can create a custom mesos framework for your requirement, to get you started you can check this out http://mesos.apache.org/documentation/latest/app-framework-development-guide/ Thanks Best Regards On Mon, Aug 24, 2015 at 12:11 PM, Romi Kuntsman r...@totango.com wrote: Hi, I have a spark

Re: How to increase the Json parsing speed

2015-08-27 Thread Gavin Yue
Just did some tests. I have 6000 files, each has 14K records with 900Mb file size. In spark sql, it would take one task roughly 1 min to parse. On the local machine, using the same Jackson lib inside Spark lib. Just parse it. FileInputStream fstream = new FileInputStream(testfile);

How to increase the Json parsing speed

2015-08-27 Thread Gavin Yue
Hey I am using the Json4s-Jackson parser coming with spark and parsing roughly 80m records with totally size 900mb. But the speed is slow. It took my 50 nodes(16cores cpu,100gb mem) roughly 30mins to parse Json to use spark sql. Jackson has the benchmark saying parsing should be ms level.

Re: tweet transformation ideas

2015-08-27 Thread Holden Karau
It seems like this might be better suited to a broadcasted hash map since 200k entries isn't that big. You can then map over the tweets and lookup each word in the broadcasted map. On Thursday, August 27, 2015, Jesse F Chen jfc...@us.ibm.com wrote: This is a question on general usage/best

Re: suggest configuration for debugging spark streaming, kafka

2015-08-27 Thread Jacek Laskowski
On Wed, Aug 26, 2015 at 11:02 PM, Joanne Contact joannenetw...@gmail.com wrote: Hi I have a Ubuntu box with 4GB memory and duo cores. Do you think it won't be enough to run spark streaming and kafka? I try to install standalone mode spark kafka so I can debug them in IDE. Do I need to install

Re: Build k-NN graph for large dataset

2015-08-27 Thread Jaonary Rabarisoa
Thank you all for these links. I'll check them. On Wed, Aug 26, 2015 at 5:05 PM, Charlie Hack charles.t.h...@gmail.com wrote: +1 to all of the above esp. Dimensionality reduction and locality sensitive hashing / min hashing. There's also an algorithm implemented in MLlib called DIMSUM which

Re: query avro hive table in spark sql

2015-08-27 Thread ponkin
Can you select something from this table using Hive? And also could you post your spark code which leads to this exception. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/query-avro-hive-table-in-spark-sql-tp24462p24468.html Sent from the Apache Spark User

Re: error accessing vertexRDD

2015-08-27 Thread ponkin
Check permission for user which runs spark-shell (Permission denied) - means that you do not have permissions to /tmp -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/error-accessing-vertexRDD-tp24466p24469.html Sent from the Apache Spark User List mailing

Re: creating data warehouse with Spark and running query with Hive

2015-08-27 Thread Akhil Das
Can you paste the stacks-trace? Is it complaining about directory already exists? Thanks Best Regards On Thu, Aug 20, 2015 at 11:23 AM, Jeetendra Gangele gangele...@gmail.com wrote: HI All, I have a data in HDFS partition with Year/month/data/event_type. And I am creating a hive tables with

Re: load NULL Values in RDD

2015-08-27 Thread Akhil Das
Are you reading the column data from a SQL database? If so, have a look at the JdbcRDD. http://www.infoobjects.com/spark-sql-jdbcrdd/ Thanks Best Regards On Fri, Aug 21, 2015 at 1:25 AM, SAHA, DEBOBROTA ds3...@att.com wrote: Hi , Can anyone help me in loading a column that may or may not

Re: SQLContext load. Filtering files

2015-08-27 Thread Masf
Thanks Akhil, I will have a look. I have a dude regarding to spark streaming and filestream. If spark streaming crashs and while spark was down new files are created in input folder, when spark streaming is launched again, how can I process these files? Thanks. Regards. Miguel. On Thu, Aug

Re: sparkStreaming how to work with partitions,how tp create partition

2015-08-27 Thread Akhil Das
You can use the driver ui and click on the Jobs - Stages to see the number of partitions being created for that job. If you want to increase the partitions, then you could do a .repartition too. With directStream api i guess the # partitions in spark will be equal to the number of partitions in

Re: SQLContext load. Filtering files

2015-08-27 Thread Akhil Das
Have a look at the spark streaming. You can make use of the ssc.fileStream. Eg: val avroStream = ssc.fileStream[AvroKey[GenericRecord], NullWritable, AvroKeyInputFormat[GenericRecord]](input) You can also specify a filter function

Re: How to set the number of executors and tasks in a Spark Streaming job in Mesos

2015-08-27 Thread Akhil Das
How many mesos slaves are you having? and how many cores are you having in total? sparkConf.set(spark.mesos.coarse, true) sparkConf.set(spark.cores.max, 128) These two configurations are sufficient. Now regarding the active tasks, how many partitions are you seeing for that job? You

Re: org.apache.hadoop.security.AccessControlException: Permission denied when access S3

2015-08-27 Thread Akhil Das
Did you try with s3a? Also make sure your key does not have any wildcard chars in it. Thanks Best Regards On Fri, Aug 21, 2015 at 2:03 AM, Shuai Zheng szheng.c...@gmail.com wrote: Hi All, I try to access S3 file from S3 in Hadoop file format: Below is my code:

Re: How frequently should full gc we expect

2015-08-27 Thread Akhil Das
Used to hit GC: Overhead limit exceeded and sometime GC: Heap error too. Thanks Best Regards On Sat, Aug 22, 2015 at 2:44 AM, java8964 java8...@hotmail.com wrote: In the test job I am running in Spark 1.3.1 in our stage cluster, I can see following information on the application stage

Re: SQLContext load. Filtering files

2015-08-27 Thread Akhil Das
If you have enabled checkpointing the spark will handle that for you. Thanks Best Regards On Thu, Aug 27, 2015 at 4:21 PM, Masf masfwo...@gmail.com wrote: Thanks Akhil, I will have a look. I have a dude regarding to spark streaming and filestream. If spark streaming crashs and while spark

Driver running out of memory - caused by many tasks?

2015-08-27 Thread andrew.rowson
I have a spark v.1.4.1 on YARN job where the first stage has ~149,000 tasks (it’s reading a few TB of data). The job itself is fairly simple - it’s just getting a list of distinct values: val days = spark .sequenceFile(inputDir, classOf[KeyClass], classOf[ValueClass])

RE: Driver running out of memory - caused by many tasks?

2015-08-27 Thread Ewan Leith
Are you using the Kryo serializer? If not, have a look at it, it can save a lot of memory during shuffles https://spark.apache.org/docs/latest/tuning.html I did a similar task and had various issues with the volume of data being parsed in one go, but that helped a lot. It looks like the main

Re: Driver running out of memory - caused by many tasks?

2015-08-27 Thread andrew.rowson
I should have mentioned: yes I am using Kryo and have registered KeyClass and ValueClass. I guess it’s not clear to me what is actually taking up space on the driver heap - I can’t see how it can be data with the code that I have. On 27/08/2015 12:09, Ewan Leith ewan.le...@realitymine.com

Selecting different levels of nested data records during one select?

2015-08-27 Thread Ewan Leith
Hello, I'm trying to query a nested data record of the form: root |-- userid: string (nullable = true) |-- datarecords: array (nullable = true) ||-- element: struct (containsNull = true) |||-- name: string (nullable = true) |||-- system: boolean (nullable = true) ||

[ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.2:testCompile

2015-08-27 Thread Jacek Laskowski
Hi, Is this a known issue of building Spark from today's sources using Scala 2.11? I did the following: ➜ spark git:(master) ✗ git rev-parse HEAD de0278286cf6db8df53b0b68918ea114f2c77f1f ➜ spark git:(master) ✗ ./dev/change-scala-version.sh 2.11 ➜ spark git:(master) ✗ ./build/mvn -Pyarn

Re: [ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.2:testCompile

2015-08-27 Thread Jacek Laskowski
Hi, I'm trying to nail it down myself, too. Is there anything relevant to help on my side? Pozdrawiam, Jacek -- Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl Follow me at https://twitter.com/jaceklaskowski Upvote at

Re: [ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.2:testCompile

2015-08-27 Thread Jacek Laskowski
Hi, Sean helped me offline and I sent https://github.com/apache/spark/pull/8479 for review. That's the only breaking place for the build I could find. Tested with 2.10 and 2.11. Pozdrawiam, Jacek -- Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl Follow me at

Re: [ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.2:testCompile

2015-08-27 Thread Sean Owen
Hm, if anything that would be related to my change at https://issues.apache.org/jira/browse/SPARK-9613. Let me investigate. There's some chance there is some Scala 2.11-specific problem here. On Thu, Aug 27, 2015 at 8:51 AM, Jacek Laskowski ja...@japila.pl wrote: Hi, Is this a known issue of