Re: Unable to stop Worker in standalone mode by sbin/stop-all.sh

2015-03-12 Thread Ted Yu
Does the machine have cron job that periodically cleans up /tmp dir ? Cheers On Thu, Mar 12, 2015 at 6:18 PM, sequoiadb mailing-list-r...@sequoiadb.com wrote: Checking the script, it seems spark-daemon.sh unable to stop the worker $ ./spark-daemon.sh stop org.apache.spark.deploy.worker.Worker

Re: OutOfMemoryError when using DataFrame created by Spark SQL

2015-03-25 Thread Ted Yu
Can you try giving Spark driver more heap ? Cheers On Mar 25, 2015, at 2:14 AM, Todd Leo sliznmail...@gmail.com wrote: Hi, I am using Spark SQL to query on my Hive cluster, following Spark SQL and DataFrame Guide step by step. However, my HiveQL via sqlContext.sql() fails and

Re: Untangling dependency issues in spark streaming

2015-03-29 Thread Ted Yu
For Gradle, there are: https://github.com/musketyr/gradle-fatjar-plugin https://github.com/johnrengelman/shadow FYI On Sun, Mar 29, 2015 at 4:29 PM, jay vyas jayunit100.apa...@gmail.com wrote: thanks for posting this! Ive ran into similar issues before, and generally its a bad idea to swap

Re: [Spark Streaming] Disk not being cleaned up during runtime after RDD being processed

2015-03-29 Thread Ted Yu
Nathan: Please look in log files for any of the following: doCleanupRDD(): case e: Exception = logError(Error cleaning RDD + rddId, e) doCleanupShuffle(): case e: Exception = logError(Error cleaning shuffle + shuffleId, e) doCleanupBroadcast(): case e: Exception =

Re: Build fails on 1.3 Branch

2015-03-29 Thread Ted Yu
Jenkins build failed too: https://amplab.cs.berkeley.edu/jenkins/view/Spark/job/Spark-1.3-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/326/consoleFull For the moment, you can apply the following change: diff --git

Re: Too many open files

2015-03-30 Thread Ted Yu
bq. In /etc/secucity/limits.conf set the next values: Have you done the above modification on all the machines in your Spark cluster ? If you use Ubuntu, be sure that the /etc/pam.d/common-session file contains the following line: session required pam_limits.so On Mon, Mar 30, 2015 at 5:08

Re: Spark streaming with Kafka, multiple partitions fail, single partition ok

2015-03-30 Thread Ted Yu
Nicolas: See if there was occurrence of the following exception in the log: errs = throw new SparkException( sCouldn't connect to leader for topic ${part.topic} ${part.partition}: + errs.mkString(\n)), Cheers On Mon, Mar 30, 2015 at 9:40 AM, Cody Koeninger

Re: Error in Delete Table

2015-03-31 Thread Ted Yu
Which Spark and Hive release are you using ? Thanks On Mar 27, 2015, at 2:45 AM, Masf masfwo...@gmail.com wrote: Hi. In HiveContext, when I put this statement DROP TABLE IF EXISTS TestTable If TestTable doesn't exist, spark returns an error: ERROR Hive:

Re: refer to dictionary

2015-03-31 Thread Ted Yu
You can use broadcast variable. See also this thread: http://search-hadoop.com/m/JW1q5GX7U22/Spark+broadcast+variablesubj=How+Broadcast+variable+scale+ On Mar 31, 2015, at 4:43 AM, Peng Xia sparkpeng...@gmail.com wrote: Hi, I have a RDD (rdd1)where each line is split into an array [a,

Re: Spark streaming with Kafka, multiple partitions fail, single partition ok

2015-03-31 Thread Ted Yu
10 --topic toto kafka-topics --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic toto-single I'm launching my Spark Streaming in local mode. @Ted Yu There's no log Couldn't connect to leader for topic, here's the full version : spark-submit --conf

Re: Spark 1.3.0 DataFrame and Postgres

2015-04-01 Thread Ted Yu
+1 on escaping column names. On Apr 1, 2015, at 5:50 AM, fergjo00 johngfergu...@gmail.com wrote: Question: --- Is there a way to have JDBC DataFrames use quoted/escaped column names? Right now, it looks like it sees the names correctly in the schema created but does not

Re: Spark 1.3 build with hive support fails on JLine

2015-04-01 Thread Ted Yu
Please invoke dev/change-version-to-2.11.sh before running mvn. Cheers On Mon, Mar 30, 2015 at 1:02 AM, Night Wolf nightwolf...@gmail.com wrote: Hey, Trying to build Spark 1.3 with Scala 2.11 supporting yarn hive (with thrift server). Running; *mvn -e -DskipTests -Pscala-2.11

Re: Spark streaming

2015-03-27 Thread Ted Yu
jamborta : Please also describe the format of your csv files. Cheers On Fri, Mar 27, 2015 at 6:42 AM, DW @ Gmail deanwamp...@gmail.com wrote: Show us the code. This shouldn't happen for the simple process you described Sent from my rotary phone. On Mar 27, 2015, at 5:47 AM, jamborta

Re: java.io.NotSerializableException: org.apache.hadoop.hbase.client.Result

2015-03-31 Thread Ted Yu
Jeetendra: Please extract the information you need from Result and return the extracted portion - instead of returning Result itself. Cheers On Tue, Mar 31, 2015 at 1:14 PM, Nan Zhu zhunanmcg...@gmail.com wrote: The example in

Re: Anyone has some simple example with spark-sql with spark 1.3

2015-03-28 Thread Ted Yu
Please take a look at https://spark.apache.org/docs/latest/sql-programming-guide.html Cheers On Mar 28, 2015, at 5:08 AM, Vincent He vincent.he.andr...@gmail.com wrote: I am learning spark sql and try spark-sql example, I running following code, but I got exception ERROR CliDriver:

Re: Anyone has some simple example with spark-sql with spark 1.3

2015-03-28 Thread Ted Yu
sample with scala or python, but for spark-sql shell, I can not get an exmaple running successfully, can you give me an example I can run with ./bin/spark-sql without writing any code? thanks On Sat, Mar 28, 2015 at 7:35 AM, Ted Yu yuzhih...@gmail.com wrote: Please take a look at https

Re: Can't access file in spark, but can in hadoop

2015-03-28 Thread Ted Yu
there would be an exclusion in the pom to deal with this. Dale. From: Zhan Zhang zzh...@hortonworks.com Date: Friday, March 27, 2015 at 4:28 PM To: Johnson, Dale daljohn...@ebay.com Cc: Ted Yu yuzhih...@gmail.com, user user@spark.apache.org Subject: Re: Can't access file in spark, but can

Re: Spark-submit not working when application jar is in hdfs

2015-03-28 Thread Ted Yu
Looking at SparkSubmit#addJarToClasspath(): uri.getScheme match { case file | local = ... case _ = printWarning(sSkip remote jar $uri.) It seems hdfs scheme is not recognized. FYI On Thu, Feb 26, 2015 at 6:09 PM, dilm dmend...@exist.com wrote: I'm trying to run a

Re: registerTempTable is not a member of RDD on spark 1.2?

2015-03-23 Thread Ted Yu
Have you tried adding the following ? import org.apache.spark.sql.SQLContext Cheers On Mon, Mar 23, 2015 at 6:45 AM, IT CTO goi@gmail.com wrote: Thanks. I am new to the environment and running cloudera CDH5.3 with spark in it. apparently when running in spark-shell this command val

Re: Convert Spark SQL table to RDD in Scala / error: value toFloat is a not a member of Any

2015-03-22 Thread Ted Yu
I thought of formation #1. But looks like when there're many fields, formation #2 is cleaner. Cheers On Sun, Mar 22, 2015 at 8:14 PM, Cheng Lian lian.cs@gmail.com wrote: You need either .map { row = (row(0).asInstanceOf[Float], row(1).asInstanceOf[Float], ...) } or .map { case

Re: Spark 1.2. loses often all executors

2015-03-23 Thread Ted Yu
In this thread: http://search-hadoop.com/m/JW1q5DM69G I only saw two replies. Maybe some people forgot to use 'Reply to All' ? Cheers On Mon, Mar 23, 2015 at 8:19 AM, mrm ma...@skimlinks.com wrote: Hi, I have received three replies to my question on my personal e-mail, why don't they also

Re: Spark Error: Cause was: akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkMaster@localhost:7077

2015-03-02 Thread Ted Yu
bq. Cause was: akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkMaster@localhost:7077 There should be some more output following the above line. Can you post them ? Cheers On Mon, Mar 2, 2015 at 2:06 PM, Krishnanand Khambadkone kkhambadk...@yahoo.com.invalid wrote: Hi, I am

Re: Executing hive query from Spark code

2015-03-02 Thread Ted Yu
Here is snippet of dependency tree for spark-hive module: [INFO] org.apache.spark:spark-hive_2.10:jar:1.3.0-SNAPSHOT ... [INFO] +- org.spark-project.hive:hive-metastore:jar:0.13.1a:compile [INFO] | +- org.spark-project.hive:hive-shims:jar:0.13.1a:compile [INFO] | | +-

Re: Spark UI and running spark-submit with --master yarn

2015-03-02 Thread Ted Yu
Default RM Web UI port is 8088 (configurable through yarn.resourcemanager.webapp.address) Cheers On Mon, Mar 2, 2015 at 4:14 PM, Anupama Joshi anupama.jo...@gmail.com wrote: Hi Marcelo, Thanks for the quick reply. I have a EMR cluster and I am running the spark-submit on the master node in

Re: Spark Error: Cause was: akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkMaster@localhost:7077

2015-03-02 Thread Ted Yu
-20150302155433- is FAILED On Monday, March 2, 2015 2:42 PM, Ted Yu yuzhih...@gmail.com wrote: bq. Cause was: akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkMaster@localhost:7077 There should be some more output following the above line. Can you post them ? Cheers

Re: Resource manager UI for Spark applications

2015-03-03 Thread Ted Yu
bq. spark UI does not work for Yarn-cluster. Can you be a bit more specific on the error(s) you saw ? What Spark release are you using ? Cheers On Tue, Mar 3, 2015 at 8:53 AM, Rohini joshi roni.epi...@gmail.com wrote: Sorry , for half email - here it is again in full Hi , I have 2

Re: bitten by spark.yarn.executor.memoryOverhead

2015-02-28 Thread Ted Yu
dataset we were using locally. It took me a couple days and digging through many logs to figure out this value was what was causing the problem. On Sat, Feb 28, 2015 at 11:38 AM, Ted Yu yuzhih...@gmail.com wrote: Having good out-of-box experience is desirable. +1 on increasing the default

Re: bitten by spark.yarn.executor.memoryOverhead

2015-03-02 Thread Ted Yu
, Ted Yu yuzhih...@gmail.com wrote: I have created SPARK-6085 with pull request: https://github.com/apache/spark/pull/4836 Cheers On Sat, Feb 28, 2015 at 12:08 PM, Corey Nolet cjno...@gmail.com wrote: +1 to a better default as well. We were working find until we ran against a real dataset

Re: Resource manager UI for Spark applications

2015-03-03 Thread Ted Yu
to the external one , but still does not work. Thanks _roni On Tue, Mar 3, 2015 at 9:05 AM, Ted Yu yuzhih...@gmail.com wrote: bq. spark UI does not work for Yarn-cluster. Can you be a bit more specific on the error(s) you saw ? What Spark release are you using ? Cheers On Tue, Mar 3, 2015 at 8:53

Re: Issue using S3 bucket from Spark 1.2.1 with hadoop 2.4

2015-03-03 Thread Ted Yu
If you can use hadoop 2.6.0 binary, you can use s3a s3a is being polished in the upcoming 2.7.0 release: https://issues.apache.org/jira/browse/HADOOP-11571 Cheers On Tue, Mar 3, 2015 at 9:44 AM, Ankur Srivastava ankur.srivast...@gmail.com wrote: Hi, We recently upgraded to Spark 1.2.1 -

Re: spark sql median and standard deviation

2015-03-04 Thread Ted Yu
Please take a look at DoubleRDDFunctions.scala : /** Compute the mean of this RDD's elements. */ def mean(): Double = stats().mean /** Compute the variance of this RDD's elements. */ def variance(): Double = stats().variance /** Compute the standard deviation of this RDD's elements.

Re: Driver disassociated

2015-03-04 Thread Ted Yu
What release are you using ? SPARK-3923 went into 1.2.0 release. Cheers On Wed, Mar 4, 2015 at 1:39 PM, Thomas Gerber thomas.ger...@radius.com wrote: Hello, sometimes, in the *middle* of a job, the job stops (status is then seen as FINISHED in the master). There isn't anything wrong in

Re: Where can I find more information about the R interface forSpark?

2015-03-04 Thread Ted Yu
Please follow SPARK-5654 On Wed, Mar 4, 2015 at 7:22 PM, Haopu Wang hw...@qilinsoft.com wrote: Thanks, it's an active project. Will it be released with Spark 1.3.0? -- *From:* 鹰 [mailto:980548...@qq.com] *Sent:* Thursday, March 05, 2015 11:19 AM *To:*

Re: Spark Build with Hadoop 2.6, yarn - encounter java.lang.NoClassDefFoundError: org/codehaus/jackson/map/deser/std/StdDeserializer

2015-03-05 Thread Ted Yu
Please add the following to build command: -Djackson.version=1.9.3 Cheers On Thu, Mar 5, 2015 at 10:04 AM, Todd Nist tsind...@gmail.com wrote: I am running Spark on a HortonWorks HDP Cluster. I have deployed there prebuilt version but it is only for Spark 1.2.0 not 1.2.1 and there are a few

Re: Does sc.newAPIHadoopFile support multiple directories (or nested directories)?

2015-03-03 Thread Ted Yu
to read multiple HDFS files into RDD. What I am doing now is: for each file I read them into a RDD. Then later on I union all these RDDs into one RDD. I am not sure if it is the best way to do it. Thanks Senqiang On Tuesday, March 3, 2015 2:40 PM, Ted Yu yuzhih...@gmail.com wrote

Re: Does sc.newAPIHadoopFile support multiple directories (or nested directories)?

2015-03-03 Thread Ted Yu
:00 Ted Yu yuzhih...@gmail.com: Looking at FileInputFormat#listStatus(): // Whether we need to recursive look into the directory structure boolean recursive = job.getBoolean(INPUT_DIR_RECURSIVE, false); where: public static final String INPUT_DIR_RECURSIVE

Re: Does sc.newAPIHadoopFile support multiple directories (or nested directories)?

2015-03-03 Thread Ted Yu
Looking at scaladoc: /** Get an RDD for a Hadoop file with an arbitrary new API InputFormat. */ def newAPIHadoopFile[K, V, F : NewInputFormat[K, V]] Your conclusion is confirmed. On Tue, Mar 3, 2015 at 1:59 PM, S. Zhou myx...@yahoo.com.invalid wrote: I did some experiments and it seems

Re: bitten by spark.yarn.executor.memoryOverhead

2015-02-28 Thread Ted Yu
Having good out-of-box experience is desirable. +1 on increasing the default. On Sat, Feb 28, 2015 at 8:27 AM, Sean Owen so...@cloudera.com wrote: There was a recent discussion about whether to increase or indeed make configurable this kind of default fraction. I believe the suggestion

Re: Unable to find org.apache.spark.sql.catalyst.ScalaReflection class

2015-02-28 Thread Ted Yu
Have you verified that spark-catalyst_2.10 jar was in the classpath ? Cheers On Sat, Feb 28, 2015 at 9:18 AM, Ashish Nigam ashnigamt...@gmail.com wrote: Hi, I wrote a very simple program in scala to convert an existing RDD to SchemaRDD. But createSchemaRDD function is throwing exception

Re: Tools to manage workflows on Spark

2015-02-28 Thread Ted Yu
Here was latest modification in spork repo: Mon Dec 1 10:08:19 2014 Not sure if it is being actively maintained. On Sat, Feb 28, 2015 at 6:26 PM, Qiang Cao caoqiang...@gmail.com wrote: Thanks for the pointer, Ashish! I was also looking at Spork https://github.com/sigmoidanalytics/spork

Re: unsafe memory access in spark 1.2.1

2015-03-01 Thread Ted Yu
Environment (rhel-2.5.4.0.el6_6-x86_64 u75-b13) OpenJDK 64-Bit Server VM (build 24.75-b04, mixed mode) Thanks *From:* Ted Yu [mailto:yuzhih...@gmail.com] *Sent:* Sunday, March 01, 2015 10:18 PM *To:* Zalzberg, Idan (Agoda) *Cc:* user@spark.apache.org *Subject:* Re: unsafe memory access

Re: unsafe memory access in spark 1.2.1

2015-03-01 Thread Ted Yu
What Java version are you using ? Thanks On Sun, Mar 1, 2015 at 7:03 AM, Zalzberg, Idan (Agoda) idan.zalzb...@agoda.com wrote: Hi, I am using spark 1.2.1, sometimes I get these errors sporadically: Any thought on what could be the cause? Thanks 2015-02-27 15:08:47 ERROR

Re: How to parse Json formatted Kafka message in spark streaming

2015-03-05 Thread Ted Yu
Cui: You can check messages.partitions.size to determine whether messages is an empty RDD. Cheers On Thu, Mar 5, 2015 at 12:52 AM, Akhil Das ak...@sigmoidanalytics.com wrote: When you use KafkaUtils.createStream with StringDecoders, it will return String objects inside your messages stream.

Re: How to integrate HBASE on Spark

2015-02-23 Thread Ted Yu
Installing hbase on hadoop cluster would allow hbase to utilize features provided by hdfs, such as short circuit read (See '90.2. Leveraging local data' under http://hbase.apache.org/book.html#perf.hdfs). Cheers On Sun, Feb 22, 2015 at 11:38 PM, Akhil Das ak...@sigmoidanalytics.com wrote: If

Re: Spark excludes fastutil dependencies we need

2015-02-25 Thread Ted Yu
. The question is really, how to get the classloader visibility right. It depends on where you need these classes. Have you looked into spark.files.userClassPathFirst and spark.yarn.user.classpath.first ? On Wed, Feb 25, 2015 at 5:34 AM, Ted Yu yuzhih...@gmail.com wrote: bq. depend on missing

Re: [Spark SQL]: Convert SchemaRDD back to RDD

2015-02-22 Thread Ted Yu
Haven't found the method in http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.SchemaRDD The new DataFrame has this method: /** * Returns the content of the [[DataFrame]] as an [[RDD]] of [[Row]]s. * @group rdd */ def rdd: RDD[Row] = { FYI On Sun, Feb

Re: Posting to the list

2015-02-22 Thread Ted Yu
bq. i didnt get any new subscription mail in my inbox. Have you checked your Spam folder ? Cheers On Sun, Feb 22, 2015 at 2:36 PM, hnahak harihar1...@gmail.com wrote: I'm also facing the same issue, this is third time whenever I post anything it never accept by the community and at the same

Re: Launching Spark cluster on EC2 with Ubuntu AMI

2015-02-22 Thread Ted Yu
bq. bash: git: command not found Looks like the AMI doesn't have git pre-installed. Cheers On Sun, Feb 22, 2015 at 4:29 PM, olegshirokikh o...@solver.com wrote: I'm trying to launch Spark cluster on AWS EC2 with custom AMI (Ubuntu) using the following: ./ec2/spark-ec2 --key-pair=***

Re: Getting to proto buff classes in Spark Context

2015-02-23 Thread Ted Yu
bq. Caused by: java.lang.ClassNotFoundException: com.rick.reports.Reports$ SensorReports Is Reports$SensorReports class in rick-processors-assembly-1.0.jar ? Thanks On Mon, Feb 23, 2015 at 8:43 PM, necro351 . necro...@gmail.com wrote: Hello, I am trying to deserialize some data encoded

Re: InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag

2015-02-23 Thread Ted Yu
bq. have installed hadoop on a local virtual machine Can you tell us the release of hadoop you installed ? What Spark release are you using ? Or be more specific, what hadoop release was the Spark built against ? Cheers On Mon, Feb 23, 2015 at 9:37 PM, fanooos dev.fano...@gmail.com wrote: Hi

Re: Does Spark Streaming depend on Hadoop?

2015-02-23 Thread Ted Yu
Can you pastebin the whole stack trace ? Thanks On Feb 23, 2015, at 6:14 PM, bit1...@163.com bit1...@163.com wrote: Hi, When I submit a spark streaming application with following script, ./spark-submit --name MyKafkaWordCount --master local[20] --executor-memory 512M

Re: Getting to proto buff classes in Spark Context

2015-02-23 Thread Ted Yu
$Builder.class 10640 Mon Feb 23 17:34:46 PST 2015 com/defend7/reports/Reports$SensorReports.class 815 Mon Feb 23 17:34:46 PST 2015 com/defend7/reports/Reports$SensorReportsOrBuilder.class On Mon Feb 23 2015 at 8:57:18 PM Ted Yu yuzhih...@gmail.com wrote: bq. Caused

Re: Running out of space (when there's no shortage)

2015-02-24 Thread Ted Yu
Here is a tool which may give you some clue: http://file-leak-detector.kohsuke.org/ Cheers On Tue, Feb 24, 2015 at 11:04 AM, Vladimir Rodionov vrodio...@splicemachine.com wrote: Usually it happens in Linux when application deletes file w/o double checking that there are no open FDs (resource

Re: Perf Prediction

2015-02-21 Thread Ted Yu
Can you be a bit more specific ? Are you asking about performance across Spark releases ? Cheers On Sat, Feb 21, 2015 at 6:38 AM, Deep Pradhan pradhandeep1...@gmail.com wrote: Hi, Has some performance prediction work been done on Spark? Thank You

Re: Query data in Spark RRD

2015-02-21 Thread Ted Yu
Have you looked at http://spark.apache.org/docs/1.2.0/api/scala/index.html#org.apache.spark.sql.SchemaRDD ? Cheers On Sat, Feb 21, 2015 at 4:24 AM, Nikhil Bafna nikhil.ba...@flipkart.com wrote: Hi. My use case is building a realtime monitoring system over multi-dimensional data. The way

Re: upgrade to Spark 1.2.1

2015-02-25 Thread Ted Yu
Could this be caused by Spark using shaded Guava jar ? Cheers On Wed, Feb 25, 2015 at 3:26 PM, Pat Ferrel p...@occamsmachete.com wrote: Getting an error that confuses me. Running a largish app on a standalone cluster on my laptop. The app uses a guava HashBiMap as a broadcast value. With

Re: Spark excludes fastutil dependencies we need

2015-02-25 Thread Ted Yu
Maybe drop the exclusion for parquet-provided profile ? Cheers On Wed, Feb 25, 2015 at 8:42 PM, Jim Kleckner j...@cloudphysics.com wrote: Inline On Wed, Feb 25, 2015 at 1:53 PM, Ted Yu yuzhih...@gmail.com wrote: Interesting. Looking at SparkConf.scala : val configs = Seq

Re: Spark excludes fastutil dependencies we need

2015-02-24 Thread Ted Yu
bq. depend on missing fastutil classes like Long2LongOpenHashMap Looks like Long2LongOpenHashMap should be added to the shaded jar. Cheers On Tue, Feb 24, 2015 at 7:36 PM, Jim Kleckner j...@cloudphysics.com wrote: Spark includes the clearspring analytics package but intentionally excludes

Re: How to tell if one RDD depends on another

2015-02-26 Thread Ted Yu
bq. whether or not rdd1 is a cached rdd RDD has getStorageLevel method which would return the RDD's current storage level. SparkContext has this method: * Return information about what RDDs are cached, if they are in mem or on disk, how much space * they take, etc. */ @DeveloperApi

Re: JettyUtils.createServletHandler Method not Found?

2015-03-27 Thread Ted Yu
JettyUtils is marked with: private[spark] object JettyUtils extends Logging { FYI On Fri, Mar 27, 2015 at 9:50 AM, kmader kevin.ma...@gmail.com wrote: I have a very strange error in Spark 1.3 where at runtime in the org.apache.spark.ui.JettyUtils object the method createServletHandler is not

Re: RDD equivalent of HBase Scan

2015-03-26 Thread Ted Yu
In examples//src/main/scala/org/apache/spark/examples/HBaseTest.scala, TableInputFormat is used. TableInputFormat accepts parameter public static final String SCAN = hbase.mapreduce.scan; where if specified, Scan object would be created from String form: if (conf.get(SCAN) != null) {

Re: Which RDD operations preserve ordering?

2015-03-26 Thread Ted Yu
This is related: https://issues.apache.org/jira/browse/SPARK-6340 On Thu, Mar 26, 2015 at 5:58 AM, sergunok ser...@gmail.com wrote: Hi guys, I don't have exact picture about preserving of ordering of elements of RDD after executing of operations. Which operations preserve it? 1) Map

Re: Can't access file in spark, but can in hadoop

2015-03-26 Thread Ted Yu
Looks like the following assertion failed: Preconditions.checkState(storageIDsCount == locs.size()); locs is ListDatanodeInfoProto Can you enhance the assertion to log more information ? Cheers On Thu, Mar 26, 2015 at 3:06 PM, Dale Johnson daljohn...@ebay.com wrote: There seems to be a

Re: Building spark 1.2 from source requires more dependencies

2015-03-26 Thread Ted Yu
Looking at output from dependency:tree, servlet-api is brought in by the following: [INFO] +- org.apache.cassandra:cassandra-all:jar:1.2.6:compile [INFO] | +- org.antlr:antlr:jar:3.2:compile [INFO] | +- com.googlecode.json-simple:json-simple:jar:1.1:compile [INFO] | +-

Re: JAVA_HOME problem with upgrade to 1.3.0

2015-03-19 Thread Ted Yu
JAVA_HOME, an environment variable, should be defined on the node where appattempt_1420225286501_4699_02 ran. Cheers On Thu, Mar 19, 2015 at 8:59 AM, Williams, Ken ken.willi...@windlogics.com wrote: I’m trying to upgrade a Spark project, written in Scala, from Spark 1.2.1 to 1.3.0, so I

Re: StorageLevel: OFF_HEAP

2015-03-18 Thread Ted Yu
18, 2015 at 9:53 AM, Ranga sra...@gmail.com wrote: Thanks for the information. Will rebuild with 0.6.0 till the patch is merged. On Tue, Mar 17, 2015 at 7:24 PM, Ted Yu yuzhih...@gmail.com wrote: Ranga: Take a look at https://github.com/apache/spark/pull/4867 Cheers On Tue, Mar 17, 2015

Re: Bulk insert strategy

2015-03-08 Thread Ted Yu
What's the expected number of partitions in your use case ? Have you thought of doing batching in the workers ? Cheers On Sat, Mar 7, 2015 at 10:54 PM, A.K.M. Ashrafuzzaman ashrafuzzaman...@gmail.com wrote: While processing DStream in the Spark Programming Guide, the suggested usage of

Re: Spark error NoClassDefFoundError: org/apache/hadoop/mapred/InputSplit

2015-03-23 Thread Ted Yu
InputSplit is in hadoop-mapreduce-client-core jar Please check that the jar is in your classpath. Cheers On Mon, Mar 23, 2015 at 8:10 AM, , Roy rp...@njit.edu wrote: Hi, I am using CDH 5.3.2 packages installation through Cloudera Manager 5.3.2 I am trying to run one spark job with

Re: JDBC DF using DB2

2015-03-23 Thread Ted Yu
bq. is to modify compute_classpath.sh on all worker nodes to include your driver JARs. Please follow the above advice. Cheers On Mon, Mar 23, 2015 at 12:34 PM, Jack Arenas j...@ckarenas.com wrote: Hi Team, I’m trying to create a DF using jdbc as detailed here

Re: Spark log shows only this line repeated: RecurringTimer - JobGenerator] DEBUG o.a.s.streaming.util.RecurringTimer - Callback for JobGenerator called at time X

2015-03-26 Thread Ted Yu
It is logged from RecurringTimer#loop(): private def loop() { try { while (!stopped) { clock.waitTillTime(nextTime) callback(nextTime) prevTime = nextTime nextTime += period logDebug(Callback for + name + called at time + prevTime) }

Re: Hive UDAF percentile_approx says This UDAF does not support the deprecated getEvaluator() method.

2015-01-13 Thread Ted Yu
Looking at the source code for AbstractGenericUDAFResolver, the following (non-deprecated) method should be called: public GenericUDAFEvaluator getEvaluator(GenericUDAFParameterInfo info) It is called by hiveUdfs.scala (master branch): val parameterInfo = new

Re: Is it possible to use json4s 3.2.11 with Spark 1.3.0?

2015-03-23 Thread Ted Yu
Looking at core/pom.xml : dependency groupIdorg.json4s/groupId artifactIdjson4s-jackson_${scala.binary.version}/artifactId version3.2.10/version /dependency The version is hard coded. You can rebuild Spark 1.3.0 with json4s 3.2.11 Cheers On Mon, Mar 23, 2015 at 2:12

Re: pyspark hbase range scan

2015-04-01 Thread Ted Yu
Have you looked at http://happybase.readthedocs.org/en/latest/ ? Cheers On Apr 1, 2015, at 4:50 PM, Eric Kimbrel eric.kimb...@soteradefense.com wrote: I am attempting to read an hbase table in pyspark with a range scan. conf = { hbase.zookeeper.quorum: host,

Re: Connection pooling in spark jobs

2015-04-02 Thread Ted Yu
http://docs.oracle.com/cd/B10500_01/java.920/a96654/connpoca.htm The question doesn't seem to be Spark specific, btw On Apr 2, 2015, at 4:45 AM, Sateesh Kavuri sateesh.kav...@gmail.com wrote: Hi, We have a case that we will have to run concurrent jobs (for the same algorithm) on

Re: 答复:maven compile error

2015-04-03 Thread Ted Yu
Can you include -X in your maven command and pastebin the output ? Cheers On Apr 3, 2015, at 3:58 AM, myelinji myeli...@aliyun.com wrote: Thank you for your reply. When I'm using maven to compile the whole project, the erros as follows [INFO] Spark Project Parent POM

Re: About Waiting batches on the spark streaming UI

2015-04-03 Thread Ted Yu
Maybe add another stat for batches waiting in the job queue ? Cheers On Fri, Apr 3, 2015 at 10:01 AM, Tathagata Das t...@databricks.com wrote: Very good question! This is because the current code is written such that the ui considers a batch as waiting only when it has actually started being

Re: Spark SQL vs map reduce tableInputOutput

2015-04-20 Thread Ted Yu
performance of application? Regards Jeetendra On 20 April 2015 at 20:49, Ted Yu yuzhih...@gmail.com wrote: To my knowledge, Spark SQL currently doesn't provide range scan capability against hbase. Cheers On Apr 20, 2015, at 7:54 AM, Jeetendra Gangele gangele...@gmail.com wrote: HI All

Re: spark 1.3.1 : unable to access s3n:// urls (no file system for scheme s3n:)

2015-04-23 Thread Ted Yu
NativeS3FileSystem class is in hadoop-aws jar. Looks like it was not on classpath. Cheers On Thu, Apr 23, 2015 at 7:30 AM, Sujee Maniyam su...@sujee.net wrote: Thanks all... btw, s3n load works without any issues with spark-1.3.1-bulit-for-hadoop 2.4 I tried this on 1.3.1-hadoop26

Re: Slower performance when bigger memory?

2015-04-23 Thread Ted Yu
Shuai: Please take a look at: http://blog.takipi.com/garbage-collectors-serial-vs-parallel-vs-cms-vs-the-g1-and-whats-new-in-java-8/ On Apr 23, 2015, at 10:18 AM, Dean Wampler deanwamp...@gmail.com wrote: JVM's often have significant GC overhead with heaps bigger than 64GB. You might try

Re: implicits is not a member of org.apache.spark.sql.SQLContext

2015-04-21 Thread Ted Yu
Have you tried the following ? import sqlContext._ import sqlContext.implicits._ Cheers On Tue, Apr 21, 2015 at 7:54 AM, Wang, Ningjun (LNG-NPV) ningjun.w...@lexisnexis.com wrote: I tried to convert an RDD to a data frame using the example codes on spark website case class

Re: Meet Exception when learning Broadcast Variables

2015-04-21 Thread Ted Yu
Does line 27 correspond to brdcst.value ? Cheers On Apr 21, 2015, at 3:19 AM, donhoff_h 165612...@qq.com wrote: Hi, experts. I wrote a very little program to learn how to use Broadcast Variables, but met an exception. The program and the exception are listed as following. Could

Re: spark 1.3.1 : unable to access s3n:// urls (no file system for scheme s3n:)

2015-04-22 Thread Ted Yu
This thread from hadoop mailing list should give you some clue: http://search-hadoop.com/m/LgpTk2df7822 On Wed, Apr 22, 2015 at 9:45 AM, Sujee Maniyam su...@sujee.net wrote: Hi all I am unable to access s3n:// urls using sc.textFile().. getting 'no file system for scheme s3n://' error.

Re: Auto Starting a Spark Job on Cluster Starup

2015-04-22 Thread Ted Yu
This thread seems related: http://search-hadoop.com/m/JW1q51W02V Cheers On Wed, Apr 22, 2015 at 6:09 AM, James King jakwebin...@gmail.com wrote: What's the best way to start-up a spark job as part of starting-up the Spark cluster. I have an single uber jar for my job and want to make the

Re: Spark Performance on Yarn

2015-04-22 Thread Ted Yu
In master branch, overhead is now 10%. That would be 500 MB FYI On Apr 22, 2015, at 8:26 AM, nsalian neeleshssal...@gmail.com wrote: +1 to executor-memory to 5g. Do check the overhead space for both the driver and the executor as per Wilfred's suggestion. Typically, 384 MB should

Re: Parquet error reading data that contains array of structs

2015-04-24 Thread Ted Yu
Yin: Fix Version of SPARK-4520 is not set. I assume it was fixed in 1.3.0 Cheers Fix Version On Fri, Apr 24, 2015 at 11:00 AM, Yin Huai yh...@databricks.com wrote: The exception looks like the one mentioned in https://issues.apache.org/jira/browse/SPARK-4520. What is the version of Spark?

Re: ORCFiles

2015-04-24 Thread Ted Yu
Please see SPARK-2883 There is no Fix Version yet. On Fri, Apr 24, 2015 at 5:45 PM, David Mitchell jdavidmitch...@gmail.com wrote: Does anyone know in which version of Spark will there be support for ORCFiles via spark.sql.hive? Will it be in 1.4? David

Re: Spark SQL 1.3.1: java.lang.ClassCastException is thrown

2015-04-25 Thread Ted Yu
Looks like this is related: https://issues.apache.org/jira/browse/SPARK-5456 On Sat, Apr 25, 2015 at 6:59 AM, doovs...@sina.com wrote: Hi all, When I query Postgresql based on Spark SQL like this: dataFrame.registerTempTable(Employees) val emps = sqlContext.sql(select name,

Re: How to debug Spark on Yarn?

2015-04-23 Thread Ted Yu
For step 2, you can pipe application log to a file instead of copy-pasting. Cheers On Apr 22, 2015, at 10:48 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: I submit a spark app to YARN and i get these messages 15/04/22 22:45:04 INFO yarn.Client: Application report for

Re: Spark SQL vs map reduce tableInputOutput

2015-04-20 Thread Ted Yu
To my knowledge, Spark SQL currently doesn't provide range scan capability against hbase. Cheers On Apr 20, 2015, at 7:54 AM, Jeetendra Gangele gangele...@gmail.com wrote: HI All, I am Querying Hbase and combining result and using in my spake job. I am querying hbase using Hbase

Re: compliation error

2015-04-19 Thread Ted Yu
What JDK release are you using ? Can you give the complete command you used ? Which Spark branch are you working with ? Cheers On Sun, Apr 19, 2015 at 7:25 PM, Brahma Reddy Battula brahmareddy.batt...@huawei.com wrote: Hi All Getting following error, when I am compiling spark..What did I

Re: HBase HTable constructor hangs

2015-04-28 Thread Ted Yu
Can you give us more information ? Such as hbase release, Spark release. If you can pastebin jstack of the hanging HTable process, that would help. BTW I used http://search-hadoop.com/?q=spark+HBase+HTable+constructor+hangs and saw a very old thread with this subject. Cheers On Tue, Apr 28,

Re: hive-thriftserver maven artifact

2015-04-28 Thread Ted Yu
Credit goes to Misha Chernetsov (see SPARK-4925) FYI On Tue, Apr 28, 2015 at 8:25 AM, Marco marco@gmail.com wrote: Thx Ted for the info ! 2015-04-27 23:51 GMT+02:00 Ted Yu yuzhih...@gmail.com: This is available for 1.3.1: http://mvnrepository.com/artifact/org.apache.spark/spark-hive

Re: HBase HTable constructor hangs

2015-04-28 Thread Ted Yu
How did you distribute hbase-site.xml to the nodes ? Looks like HConnectionManager couldn't find the hbase:meta server. Cheers On Tue, Apr 28, 2015 at 9:19 PM, Tridib Samanta tridib.sama...@live.com wrote: I am using Spark 1.2.0 and HBase 0.98.1-cdh5.1.0. Here is the jstack trace. Complete

Re: Too many open files when using Spark to consume messages from Kafka

2015-04-29 Thread Ted Yu
Can you run the command 'ulimit -n' to see the current limit ? To configure ulimit settings on Ubuntu, edit */etc/security/limits.conf* Cheers On Wed, Apr 29, 2015 at 2:07 PM, Bill Jay bill.jaypeter...@gmail.com wrote: Hi all, I am using the direct approach to receive real-time data from

Re: Too many open files when using Spark to consume messages from Kafka

2015-04-29 Thread Ted Yu
(sql) } catch { case e: Exception = logger.error(e.getMessage()) } finally { if (conn != null) { conn.close } } } I am not sure whether the leakage originates from Kafka connector or the sql connections. Bill On Wed, Apr 29, 2015 at 2:12 PM, Ted

Re: real time Query engine Spark-SQL on Hbase

2015-04-30 Thread Ted Yu
bq. a single query on one filter criteria Can you tell us more about your filter ? How selective is it ? Which hbase release are you using ? Cheers On Thu, Apr 30, 2015 at 7:23 AM, Siddharth Ubale siddharth.ub...@syncoms.com wrote: Hi, I want to use Spark as Query engine on HBase with

Re: spark with standalone HBase

2015-04-30 Thread Ted Yu
) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) On Thu, Apr 30, 2015 at 1:25 PM, Saurabh Gupta saurabh.gu...@semusi.com wrote: I am using hbase -0.94.8. On Wed, Apr 29, 2015 at 11:56 PM, Ted Yu yuzhih...@gmail.com

Re: hive-thriftserver maven artifact

2015-04-27 Thread Ted Yu
This is available for 1.3.1: http://mvnrepository.com/artifact/org.apache.spark/spark-hive-thriftserver_2.10 FYI On Mon, Feb 16, 2015 at 7:24 AM, Marco marco@gmail.com wrote: Ok, so will it be only available for the next version (1.30)? 2015-02-16 15:24 GMT+01:00 Ted Yu yuzhih

Re: Exception in using updateStateByKey

2015-04-27 Thread Ted Yu
Which hadoop release are you using ? Can you check hdfs audit log to see who / when deleted spark/ck/hdfsaudit/ receivedData/0/log-1430139541443-1430139601443 ? Cheers On Mon, Apr 27, 2015 at 6:21 AM, Sea 261810...@qq.com wrote: Hi, all: I use function updateStateByKey in Spark Streaming, I

Re: Spark - Hive Metastore MySQL driver

2015-05-02 Thread Ted Yu
Can you try the patch from: [SPARK-6913][SQL] Fixed java.sql.SQLException: No suitable driver found Cheers On Sat, Mar 28, 2015 at 12:41 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: This is from my Hive installation -sh-4.1$ ls /apache/hive/lib | grep derby derby-10.10.1.1.jar

<    1   2   3   4   5   6   7   8   9   10   >