[SPARK-SQL]how to run cache command with Running the Thrift JDBC/ODBC server

2014-12-19 Thread jeanlyn92
when i run the *cache table as *in the beeline which communicate with the thrift server i got the follow error: 14/12/19 15:57:05 ERROR ql.Driver: FAILED: ParseException line 1:0 cannot recognize input near 'cache' 'table' 'jeanlyntest' org.apache.hadoop.hive.ql.parse.ParseException: line 1:0

Who manage the log4j appender while running spark on yarn?

2014-12-19 Thread WangTaoTheTonic
Hi guys, I recently ran spark on yarn and found spark didn't set any log4j properties file in configuration or code. And the log4j logs was writing into stderr file under ${yarn.nodemanager.log-dirs}/application_${appid}. I wanna know which side(spark or hadoop) controll the appender? Have

Re: [SPARK-SQL]how to run cache command with Running the Thrift JDBC/ODBC server

2014-12-19 Thread Cheng Lian
It seems that the Thrift server you connected to is the original HiveServer2 rather than Spark SQL HiveThriftServer2. On 12/19/14 4:08 PM, jeanlyn92 wrote: when i run the *cache table as *in the beeline which communicate with the thrift server i got the follow error: 14/12/19 15:57:05 ERROR

Re: Re: When will spark 1.2 released?

2014-12-19 Thread vboylin1...@gmail.com
Wow. Nice to hear that :) Keep learning On 2014-12-19 15:10 , Matei Zaharia Wrote: Yup, as he posted before, An Apache infrastructure issue prevented me from pushing this last night. The issue was resolved today and I should be able to push the final release artifacts tonight. On Dec 18, 2014,

Announcing Spark 1.2!

2014-12-19 Thread Patrick Wendell
I'm happy to announce the availability of Spark 1.2.0! Spark 1.2.0 is the third release on the API-compatible 1.X line. It is Spark's largest release ever, with contributions from 172 developers and more than 1,000 commits! This release brings operational and performance improvements in Spark

Re: Can we specify driver running on a specific machine of the cluster on yarn-cluster mode?

2014-12-19 Thread Sean Owen
That's not true in yarn-cluster mode, where the driver runs in a container that YARN creates, which may not be on the machine that runs spark-submit. As far as I know, however, you can't control where YARN allocates that, and shouldn't need to. You can probably query YARN to find where it did

Re: UNION two RDDs

2014-12-19 Thread Sean Owen
coalesce actually changes the number of partitions. Unless the original RDD had just 1 partition, coalesce(1) will make an RDD with 1 partition that is larger than the original partitions, of course. I don't think the question is about ordering of things within an element of the RDD? If the

Re: Announcing Spark 1.2!

2014-12-19 Thread Shixiong Zhu
Congrats! A little question about this release: Which commit is this release based on? v1.2.0 and v1.2.0-rc2 are pointed to different commits in https://github.com/apache/spark/releases Best Regards, Shixiong Zhu 2014-12-19 16:52 GMT+08:00 Patrick Wendell pwend...@gmail.com: I'm happy to

How to run an action and get output?

2014-12-19 Thread Ashic Mahtab
Hi,Say we have an operation that writes something to an external resource and gets some output. For example: val doSomething(entry:SomeEntry, session:Session) : SomeOutput = {val result = session.SomeOp(entry)SomeOutput(entry.Key, result.SomeProp)} I could use a transformation for

Re: Announcing Spark 1.2!

2014-12-19 Thread Sean Owen
Tag 1.2.0 is older than 1.2.0-rc2. I wonder if it just didn't get updated. I assume it's going to be 1.2.0-rc2 plus a few commits related to the release process. On Fri, Dec 19, 2014 at 9:50 AM, Shixiong Zhu zsxw...@gmail.com wrote: Congrats! A little question about this release: Which commit

Re: java.lang.ExceptionInInitializerError/Unable to load YARN support

2014-12-19 Thread Sean Owen
You've got Kerberos enabled, and it's complaining that it YARN doesn't like the Kerberos config. Have you verified this should be otherwise working, sans Spark? On Fri, Dec 19, 2014 at 3:50 AM, maven niranja...@gmail.com wrote: All, I just built Spark-1.2 on my enterprise server (which has

spark streaming python + kafka

2014-12-19 Thread Oleg Ruchovets
Hi , I've just seen that streaming spark supports python from 1.2 version. Question, does spark streaming (python version ) supports kafka integration? Thanks Oleg.

How to run an action and get output?‏

2014-12-19 Thread ashic
Hi,Say we have an operation that writes something to an external resource and gets some output. For example: val doSomething(entry:SomeEntry, session:Session) : SomeOutput = {val result = session.SomeOp(entry)SomeOutput(entry.Key, result.SomeProp)} I could use a transformation for

Does Spark 1.2.0 support Scala 2.11?

2014-12-19 Thread Jonathan Chayat
The following ticket: https://issues.apache.org/jira/browse/SPARK-1812 for supporting 2.11 have been marked as fixed in 1.2, but the docs in the Spark site still say that 2.10 is required. Thanks, Jon

Re: Does Spark 1.2.0 support Scala 2.11?

2014-12-19 Thread Gerard Maas
Check out the 'compiling for Scala 2.11' instructions: http://spark.apache.org/docs/1.2.0/building-spark.html#building-for-scala-211 -kr, Gerard. On Fri, Dec 19, 2014 at 12:00 PM, Jonathan Chayat jonatha...@supersonic.com wrote: The following ticket:

Re: Does Spark 1.2.0 support Scala 2.11?

2014-12-19 Thread Sean Owen
You might interpret that as 2.10+. Although 2.10 is still the main version in use, I think, you can see 2.11 artifacts have been published: http://search.maven.org/#artifactdetails%7Corg.apache.spark%7Cspark-core_2.11%7C1.2.0%7Cjar On Fri, Dec 19, 2014 at 11:00 AM, Jonathan Chayat

Re: How to run an action and get output?‏

2014-12-19 Thread Sean Owen
To really be correct, I think you may have to use the foreach action to persist your data, since this isn't idempotent, and then read it again in a new RDD. You might get away with map as long as you can ensure that your write process is idempotent. On Fri, Dec 19, 2014 at 10:57 AM, ashic

reading files recursively using spark

2014-12-19 Thread Hafiz Mujadid
Hi experts! what is efficient way to read all files using spark from directory and its sub-directories as well.currently i move all files from directory and it sub-directories into another temporary directory and then read them all using sc.textFile method. But I want a method so that moving to

RE: How to run an action and get output?‏

2014-12-19 Thread Ashic Mahtab
Thanks Sean. That's kind of what I figured. Luckily, for my use case writes are idempotent, so map works. From: so...@cloudera.com Date: Fri, 19 Dec 2014 11:06:51 + Subject: Re: How to run an action and get output?‏ To: as...@live.com CC: user@spark.apache.org To really be correct, I

Re: reading files recursively using spark

2014-12-19 Thread Sean Owen
How about using the HDFS API to create a list of all the directories to read from, and passing them as a comma-joined string to sc.textFile? On Fri, Dec 19, 2014 at 11:13 AM, Hafiz Mujadid hafizmujadi...@gmail.com wrote: Hi experts! what is efficient way to read all files using spark from

Scala Lazy values and partitions

2014-12-19 Thread Ashic Mahtab
Hi Guys, Are scala lazy values instantiated once per executor, or once per partition? For example, if I have: object Something = val lazy context = create() def foo(item) = context.doSomething(item) and I do someRdd.foreach(Something.foo) then will context get instantiated once per

Re: reading files recursively using spark

2014-12-19 Thread madhu phatak
Hi, You can use FileInputformat API of Hadoop and newApiHadoopFile of spark to get recursion. More on the topic you can refer here http://stackoverflow.com/questions/8114579/using-fileinputformat-addinputpaths-to-recursively-add-hdfs-path On Fri, Dec 19, 2014 at 4:50 PM, Sean Owen

Re: too many small files and task

2014-12-19 Thread bethesda
I recently had the same problem. I'm not an expert but will suggest that you concatenate your files into a smaller number of larger files. E.g. in Linux cat files a_larger_file. This helped greatly. Likely others better qualified will weigh in on this later but that's something to get you

Re: reading files recursively using spark

2014-12-19 Thread bethesda
On hdfs I created: /one/one.txt # contains text one /one/two/two.txt # contains text two Then: val data = sc.textFile(/one/*) data.collect This returned: Array(one, two) So the above path designation appears to automatically recurse for you. -- View this message in context:

Re: reading files recursively using spark

2014-12-19 Thread Hafiz Mujadid
thanks bethesda! But if we have structure like this a/b/a.txt a/c/c.txt a/d/e/e.txt then how can we handle this case? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/reading-files-recursively-using-spark-tp20782p20785.html Sent from the Apache Spark

Re: Scala Lazy values and partitions

2014-12-19 Thread Sean Owen
A val in an object should be instantiated once per JVM (really, ClassLoader, but probably won't make a difference here). Therefore I expect it is going to live effectively as long as the executor, across partitions but also across jobs. On Fri, Dec 19, 2014 at 11:21 AM, Ashic Mahtab

Re: Scala Lazy values and partitions

2014-12-19 Thread Gerard Maas
It will be instantiated once per VM, which translates to once per executor. -kr, Gerard. On Fri, Dec 19, 2014 at 12:21 PM, Ashic Mahtab as...@live.com wrote: Hi Guys, Are scala lazy values instantiated once per executor, or once per partition? For example, if I have: object Something =

300% Fraction Cached?

2014-12-19 Thread Yifan LI
Hi, I just saw an Edge RDD is 300% Fraction Cached” in Storage WebUI, what does that mean? I can understand if the value was under 100%… Thanks. Best, Yifan LI

Batch timestamp in spark streaming

2014-12-19 Thread nelson
Hi all, I know the topic have been discussed before, but i couldn't find an answer that might suits me. How do you retrieve the current batch timestamp in spark streaming? Maybe via BatchInfo but it does not seem to be linked to streaming context or else... I currently have 1 minutes micro-batch

Re: Batch timestamp in spark streaming

2014-12-19 Thread Sean Owen
Most of the methods of DStream will let you supply a function that receives a timestamp as an argument of type Time. For example, we have def foreachRDD(foreachFunc: RDD[T] = Unit) but also def foreachRDD(foreachFunc: (RDD[T], Time) = Unit) If you supply the latter, you will get the timestamp

Fetch Failure

2014-12-19 Thread bethesda
I have a job that runs fine on relatively small input datasets but then reaches a threshold where I begin to consistently get Fetch failure for the Failure Reason, late in the job, during a saveAsText() operation. The first error we are seeing on the Details for Stage page is ExecutorLostFailure

Re: pyspark exception catch

2014-12-19 Thread imazor
Hi, Thanks for the answer. regarding 2,3, its indeed the solution, but as I mentioned in my question, I can as well do input checks (using .map) before applying any other rdd operations. I still think that its overhead. Regarding 1, this will make all the other rdd operations more complex, as I

RE: 300% Fraction Cached?

2014-12-19 Thread yana
There is a Jira on this. According to the comment there it means that a block is cached in more than 1 location. I dont know why this would happen ( i used 1x replication when I saw this). Curious if someone has a more indepth explanation Sent on the new Sprint Network from my Samsung Galaxy

Querying Temp table using JDBC

2014-12-19 Thread shahab
Hi, According to Spark documentation the data sharing between two different Spark contexts is not possible. So I just wonder if it is possible to first run a job that loads some data from DB into Schema RDDs, then cache it and next register it as a temp table (let's say Table_1), now I would

Re: Fetch Failure

2014-12-19 Thread Jon Chase
I'm getting the same error (ExecutorLostFailure) - input RDD is 100k small files (~2MB each). I do a simple map, then keyBy(), and then rdd.saveAsHadoopDataset(...). Depending on the memory settings given to spark-submit, the time before the first ExecutorLostFailure varies (more memory ==

Can Spark 1.1.0 save checkpoint to HDFS 2.5.1?

2014-12-19 Thread Haopu Wang
I’m using Spark 1.1.0 built for HDFS 2.4. My application enables check-point (to HDFS 2.5.1) and it can build. But when I run it, I get below error: Exception in thread main org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4 at

RE: Scala Lazy values and partitions

2014-12-19 Thread Ashic Mahtab
Just to confirm, once per VM means that it'll be the same instance across all applications in a particular JVM instance (i.e. executor). So even if the spark application is terminated, the instance will live on, correct? I think that's what Sean said, and it seems logical. From:

Re: Fetch Failure

2014-12-19 Thread sandy . ryza
Hi Jon, The fix for this is to increase spark.yarn.executor.memoryOverhead to something greater than it's default of 384. This will increase the gap between the executors heap size and what it requests from yarn. It's required because jvms take up some memory beyond their heap size. -Sandy

Re: Can Spark 1.1.0 save checkpoint to HDFS 2.5.1?

2014-12-19 Thread Sean Owen
Yes, but your error indicates that your application is actually using Hadoop 1.x of some kind. Check your dependencies, especially hadoop-client. On Fri, Dec 19, 2014 at 2:11 PM, Haopu Wang hw...@qilinsoft.com wrote: I’m using Spark 1.1.0 built for HDFS 2.4. My application enables check-point

Re: Can Spark 1.1.0 save checkpoint to HDFS 2.5.1?

2014-12-19 Thread Raghavendra Pandey
It seems there is hadoop 1 somewhere in the path. On Fri, Dec 19, 2014, 21:24 Sean Owen so...@cloudera.com wrote: Yes, but your error indicates that your application is actually using Hadoop 1.x of some kind. Check your dependencies, especially hadoop-client. On Fri, Dec 19, 2014 at 2:11

Re: Fetch Failure

2014-12-19 Thread Jon Chase
I'm actually already running 1.1.1. I also just tried --conf spark.yarn.executor.memoryOverhead=4096, but no luck. Still getting ExecutorLostFailure (executor lost). On Fri, Dec 19, 2014 at 10:43 AM, Rafal Kwasny rafal.kwa...@gmail.com wrote: Hi, Just upgrade to 1.1.1 - it was fixed some

Re: java.lang.ExceptionInInitializerError/Unable to load YARN support

2014-12-19 Thread Niranjan Reddy
Sean, Thanks for your response. My MapReduce and Spark 1.0 (prepackaged in CDH5) jobs are running fine. It's only Spark1.2 jobs that I'm unable to run NR On Dec 19, 2014 5:03 AM, Sean Owen so...@cloudera.com wrote: You've got Kerberos enabled, and it's complaining that it YARN doesn't like

Re: Fetch Failure

2014-12-19 Thread Sandy Ryza
Do you hit the same errors? Is it now saying your containers are exceed ~10 GB? On Fri, Dec 19, 2014 at 11:16 AM, Jon Chase jon.ch...@gmail.com wrote: I'm actually already running 1.1.1. I also just tried --conf spark.yarn.executor.memoryOverhead=4096, but no luck. Still getting

Re: Fetch Failure

2014-12-19 Thread Jon Chase
Hmmm, I see this a lot (multiple times per second) in the stdout logs of my application: 2014-12-19T16:12:35.748+: [GC (Allocation Failure) [ParNew: 286663K-12530K(306688K), 0.0074579 secs] 1470813K-1198034K(2063104K), 0.0075189 secs] [Times: user=0.03 sys=0.00, real=0.01 secs] And finally

Re: Having problem with Spark streaming with Kinesis

2014-12-19 Thread Ashrafuzzaman
Thanks Aniket , clears a lot of confusion.  On Dec 14, 2014 7:11 PM, Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: The reason is because of the following code: val numStreams = numShards val kinesisStreams = (0 until numStreams).map { i = KinesisUtils.createStream(ssc, streamName,

Querying registered RDD (AsTable) using JDBC

2014-12-19 Thread shahab
Hi, Sorry for repeating the same question, just wanted to clarify the issue : Is it possible to expose a RDD (or SchemaRDD) to external components (outside spark) so it can be queried over JDBC (my goal is not to place the RDD back in a database but use this cached RDD to server JDBC queries)

Re: Fetch Failure

2014-12-19 Thread Jon Chase
Yes, same problem. On Fri, Dec 19, 2014 at 11:29 AM, Sandy Ryza sandy.r...@cloudera.com wrote: Do you hit the same errors? Is it now saying your containers are exceed ~10 GB? On Fri, Dec 19, 2014 at 11:16 AM, Jon Chase jon.ch...@gmail.com wrote: I'm actually already running 1.1.1. I

Re: Querying registered RDD (AsTable) using JDBC

2014-12-19 Thread Evert Lammerts
Yes you can, using HiveContext, a metastore and the thriftserver. The metastore persists information about your SchemaRDD, and the HiveContext, initialised with information on the metastore, can interact with the metastore. The thriftserver provides JDBC connections using the metastore. Using

Re: When will spark 1.2 released?

2014-12-19 Thread Ted Yu
Looking at: http://search.maven.org/#browse%7C717101892 The dates of the jars were still of Dec 10th. Was I looking at the wrong place ? Cheers On Thu, Dec 18, 2014 at 11:10 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Yup, as he posted before, An Apache infrastructure issue prevented me

Re: When will spark 1.2 released?

2014-12-19 Thread Corey Nolet
The dates of the jars were still of Dec 10th. I figured that was because the jars were staged in Nexus on that date (before the vote). On Fri, Dec 19, 2014 at 12:16 PM, Ted Yu yuzhih...@gmail.com wrote: Looking at: http://search.maven.org/#browse%7C717101892 The dates of the jars were

Spark Streaming Threading Model

2014-12-19 Thread Asim Jalis
Q: In Spark Streaming if your DStream transformation and output action take longer than the batch duration will the system process the next batch in another thread? Or will it just wait until the first batch’s RDD is processed? In other words does it build up a queue of buffered RDDs awaiting

Is there a way (in Java) to turn Java Iterable into a JavaRDD?

2014-12-19 Thread Steve Lewis
I notice new methods such as JavaSparkContext makeRDD (with few useful examples) - It takes a Seq but while there are ways to turn a list into a Seq I see nothing that uses an Iterable

Re: Spark Streaming Threading Model

2014-12-19 Thread Silvio Fiorito
Batches will wait for the previous batch to finish. The monitoring console will show you the backlog of waiting batches. From: Asim Jalis asimja...@gmail.commailto:asimja...@gmail.com Date: Friday, December 19, 2014 at 1:16 PM To: user user@spark.apache.orgmailto:user@spark.apache.org Subject:

run spark on mesos localy

2014-12-19 Thread Nagy Istvan
registered with 20141219-110602-16777343-5050-658-0028 15:49:32.550 [Timer-0] WARN o.a.s.scheduler.TaskSchedulerImpl - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 15:49:47.547 [Timer-0] WARN

Re: Spark Streaming Threading Model

2014-12-19 Thread jay vyas
So , at any point does a stream stop producing RDDs ? If not, is there a possibility, if the batching isnt working or is broken, that your disk / RAM will fill up to the brim w/ unprocessed RDD backlog? On Fri, Dec 19, 2014 at 1:29 PM, Silvio Fiorito silvio.fior...@granturing.com wrote:

spark/yarn ignoring num-executors (python, Amazon EMR, spark-submit, yarn-client)

2014-12-19 Thread Tim Schweichler
Hello, I'm experiencing an issue where yarn is scheduling two executors (the default) regardless of what I enter as num-executors when submitting an application. Background: I'm running Spark with Yarn on Amazon EMR. My cluster has two core nodes and three task nodes. All five nodes are

spark-shell bug with RDD distinct?

2014-12-19 Thread Jay Hutfles
Found a problem in the spark-shell, but can't confirm that it's related to open issues on Spark's JIRA page. I was wondering if anyone could help identify if this is an issue or if it's already being addressed. Test: (in spark-shell) case class Person(name: String, age: Int) val peopleList =

spark-shell bug with RDDs and case classes?

2014-12-19 Thread Jay Hutfles
Found a problem in the spark-shell, but can't confirm that it's related to open issues on Spark's JIRA page. I was wondering if anyone could help identify if this is an issue or if it's already being addressed. Test: (in spark-shell) case class Person(name: String, age: Int) val peopleList =

Re: Querying Temp table using JDBC

2014-12-19 Thread Michael Armbrust
This is experimental, but you can start the JDBC server from within your own programs https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala#L45 by passing it the HiveContext. On Fri, Dec 19, 2014 at 6:04

Re: spark-shell bug with RDDs and case classes?

2014-12-19 Thread Sean Owen
AFAIK it's a known issue of some sort in the Scala REPL, which is what the Spark REPL is. The PR that was closed was just adding tests to show it's a bug. I don't know if there is any workaround now. On Fri, Dec 19, 2014 at 7:21 PM, Jay Hutfles jayhutf...@gmail.com wrote: Found a problem in the

DAGScheduler StackOverflowError

2014-12-19 Thread David McWhorter
Hi all, I'm developing a spark application where I need to iteratively update an RDD over a large number of iterations (1000+). From reading online, I've found that I should use .checkpoint() to keep the graph from growing too large. Even when doing this, I keep getting

Any potentiail issue if I create a SparkContext in executor

2014-12-19 Thread Shuai Zheng
Hi All, I notice if we create a spark context in driver, we need to call stop method to clear it. SparkConf sparkConf = new SparkConf().setAppName(FinancialEngineExecutor); JavaSparkContext ctx = new JavaSparkContext(sparkConf); . String

Hadoop 2.6 compatibility?

2014-12-19 Thread sa
Can Spark be built with Hadoop 2.6? All I see instructions up to are for 2.4 and there does not seem to be a hadoop2.6 profile. If it works with Hadoop 2.6, can anyone recommend how to build? -- View this message in context:

Re: Nabble mailing list mirror errors: This post has NOT been accepted by the mailing list yet

2014-12-19 Thread Andy Konwinski
Yesterday, I changed the domain name in the mailing list archive settings to remove .incubator so maybe it'll work now. However, I also sent two emails about this through the nabble interface (in this same thread) yesterday and they don't appear to have made it through so not sure if it actually

Re: Hadoop 2.6 compatibility?

2014-12-19 Thread Ted Yu
You can use hadoop-2.4 profile and pass -Dhadoop.version=2.6.0 Cheers On Fri, Dec 19, 2014 at 12:51 PM, sa asuka.s...@gmail.com wrote: Can Spark be built with Hadoop 2.6? All I see instructions up to are for 2.4 and there does not seem to be a hadoop2.6 profile. If it works with Hadoop 2.6,

Yarn not running as many executors as I'd like

2014-12-19 Thread Jon Chase
Running on Amazon EMR w/Yarn and Spark 1.1.1, I have trouble getting Yarn to use the number of executors that I specify in spark-submit: --num-executors 2 In a cluster with two core nodes will typically only result in one executor running at a time. I can play with the memory settings and

Re: Nabble mailing list mirror errors: This post has NOT been accepted by the mailing list yet

2014-12-19 Thread Ted Yu
Andy: I saw two emails from you from yesterday. See this thread: http://search-hadoop.com/m/JW1q5opRsY1 Cheers On Fri, Dec 19, 2014 at 12:51 PM, Andy Konwinski andykonwin...@gmail.com wrote: Yesterday, I changed the domain name in the mailing list archive settings to remove .incubator so

Re: does spark sql support columnar compression with encoding when caching tables

2014-12-19 Thread Sadhan Sood
Hey Michael, Thank you for clarifying that. Is tachyon the right way to get compressed data in memory or should we explore the option of adding compression to cached data. This is because our uncompressed data set is too big to fit in memory right now. I see the benefit of tachyon not just with

Using Customized Hadoop InputFormat class with Spark Streaming

2014-12-19 Thread soroka21
Hello, I was successfully using my own customized Hadoop InputFormat class with JavaSparkContext.newAPIHadoopFile(...) Is there any way I can reuse my class in Spark Streaming? soroka21 -- View this message in context:

Re: does spark sql support columnar compression with encoding when caching tables

2014-12-19 Thread Michael Armbrust
Yeah, tachyon does sound like a good option here. Especially if you have nested data, its likely that parquet in tachyon will always be better supported. On Fri, Dec 19, 2014 at 2:17 PM, Sadhan Sood sadhan.s...@gmail.com wrote: Hey Michael, Thank you for clarifying that. Is tachyon the right

java.sql.SQLException: No suitable driver found

2014-12-19 Thread durga
Hi I am facing an issue with mysql jars with spark-submit. I am not running in yarn mode. spark-submit --jars $(echo mysql-connector-java-5.1.34-bin.jar | tr ' ' ',') --class com.abc.bcd.GetDBSomething myjar.jar abc bcd Any help is really appreciated. Thanks, -D 14/12/19 23:42:10 INFO

Re: Using Customized Hadoop InputFormat class with Spark Streaming

2014-12-19 Thread Michael Quinlan
Soroka, You should be able to use the filestream() method of the JavaStreamingContext. In case you need something more custom, the code below is something I developed to provide the max functionality of the Scala method, but implemented in Java. //Set these to reflect your app and input format

RE: Can Spark 1.1.0 save checkpoint to HDFS 2.5.1?

2014-12-19 Thread Haopu Wang
My application doesn’t depends on hadoop-client directly. It only depends on spark-core_2.10 which depends on hadoop-client 1.0.4. This can be checked by Maven repository at http://mvnrepository.com/artifact/org.apache.spark/spark-core_2.10/1.1.0 That’s strange and how to workaround the

Re: Can Spark 1.1.0 save checkpoint to HDFS 2.5.1?

2014-12-19 Thread Marcelo Vanzin
On Fri, Dec 19, 2014 at 4:05 PM, Haopu Wang hw...@qilinsoft.com wrote: My application doesn’t depends on hadoop-client directly. It only depends on spark-core_2.10 which depends on hadoop-client 1.0.4. This can be checked by Maven repository at

Re: Hadoop 2.6 compatibility?

2014-12-19 Thread Denny Lee
To clarify, there isn't a Hadoop 2.6 profile per se but you can build using -Dhadoop.version=2.4 which works with Hadoop 2.6. On Fri, Dec 19, 2014 at 12:55 Ted Yu yuzhih...@gmail.com wrote: You can use hadoop-2.4 profile and pass -Dhadoop.version=2.6.0 Cheers On Fri, Dec 19, 2014 at 12:51

RE: Can Spark 1.1.0 save checkpoint to HDFS 2.5.1?

2014-12-19 Thread Haopu Wang
Hi Sean, I change Spark as provided dependency and declare hadoop-client 2.5.1 as compile dependency. Now I see this error when do “mvn package”. Do you know what could be the reason? [INFO] --- scala-maven-plugin:3.1.3:compile (default) @ testspark --- [WARNING] Expected all

Re: Hadoop 2.6 compatibility?

2014-12-19 Thread Ted Yu
Here is the command I used: mvn package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -DskipTests FYI On Fri, Dec 19, 2014 at 4:35 PM, Denny Lee denny.g@gmail.com wrote: To clarify, there isn't a Hadoop 2.6 profile per se but you can build using

Re: Hadoop 2.6 compatibility?

2014-12-19 Thread Denny Lee
Sorry Ted! I saw profile (-P) but missed the -D. My bad! On Fri, Dec 19, 2014 at 16:46 Ted Yu yuzhih...@gmail.com wrote: Here is the command I used: mvn package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -DskipTests FYI On Fri, Dec 19, 2014 at 4:35 PM, Denny

Re: does spark sql support columnar compression with encoding when caching tables

2014-12-19 Thread Sadhan Sood
Thanks Michael, that makes sense. On Fri, Dec 19, 2014 at 3:13 PM, Michael Armbrust mich...@databricks.com wrote: Yeah, tachyon does sound like a good option here. Especially if you have nested data, its likely that parquet in tachyon will always be better supported. On Fri, Dec 19, 2014

Re: Yarn not running as many executors as I'd like

2014-12-19 Thread Marcelo Vanzin
How many cores / memory do you have available per NodeManager, and how many cores / memory are you requesting for your job? Remember that in Yarn mode, Spark launches num executors + 1 containers. The extra container, by default, reserves 1 core and about 1g of memory (more if running in cluster

SchemaRDD to Hbase

2014-12-19 Thread Subacini B
Hi All, Is there any API that can be used directly to write schemaRDD to HBase?? If not, what is the best way to write schemaRDD to HBase. Thanks Subacini