Re: Exception handling in Spark

2020-05-05 Thread Todd Nist
Path("")) if (fileExists) println("File exists!") else println("File doesn't exist!") Not sure that will help you or not, just a thought. -Todd On Tue, May 5, 2020 at 11:45 AM Mich Talebzadeh wrote: > Thanks Brandon! > > i should have remembere

Re: Using P4J Plugins with Spark

2020-04-21 Thread Todd Nist
You may want to make sure you include the jar of P4J and your plugins as part of the following so that both the driver and executors have access. If HDFS is out then you could make a common mount point on each of the executor nodes so they have access to the classes. - spark-submit --jars

Re: spark.submit.deployMode: cluster

2019-03-29 Thread Todd Nist
A little late, but have you looked at https://livy.incubator.apache.org/, works well for us. -Todd On Thu, Mar 28, 2019 at 9:33 PM Jason Nerothin wrote: > Meant this one: https://docs.databricks.com/api/latest/jobs.html > > On Thu, Mar 28, 2019 at 5:06 PM Pat Ferrel wrote: &g

Re: cache table vs. parquet table performance

2019-01-16 Thread Todd Nist
Hi Tomas, Have you considered using something like https://www.alluxio.org/ for you cache? Seems like a possible solution for what your trying to do. -Todd On Tue, Jan 15, 2019 at 11:24 PM 大啊 wrote: > Hi ,Tomas. > Thanks for your question give me some prompt.But the best way use

Re: Backpressure initial rate not working

2018-07-26 Thread Todd Nist
the maxRatePerPartition and backpressure.enabled. I thought that maxRate was not applicable when using back pressure, but may be mistaken. -Todd On Thu, Jul 26, 2018 at 8:46 AM Biplob Biswas wrote: > Hi Todd, > > Thanks for the reply. I have the mayxRatePerPartition set as well. Below >

Re: Backpressure initial rate not working

2018-07-26 Thread Todd Nist
tch* when the backpressure mechanism is > enabled. If you set the maxRatePerPartition and apply the above formula, I believe you will be able to achieve the results you are looking for. HTH. -Todd On Thu, Jul 26, 2018 at 7:21 AM Biplob Biswas wrote: > Did anyone face similar issue

Re: Tableau BI on Spark SQL

2017-01-30 Thread Todd Nist
well for us. We did the extract route originally, but with the native Exasol connector it is just a performant as the extract. HTH. -Todd On Mon, Jan 30, 2017 at 10:15 PM, Jörn Franke <jornfra...@gmail.com> wrote: > With a lot of data (TB) it is not that good, hence the extraction. &g

Re: is there any bug for the configuration of spark 2.0 cassandra spark connector 2.0 and cassandra 3.0.8

2016-09-20 Thread Todd Nist
-compatibility The JIRA, https://datastax-oss.atlassian.net/browse/SPARKC/, does not seem to show any outstanding issues with regards to 3.0.8 and 2.0 of Spark or Spark Cassandra Connector. HTH. -Todd On Tue, Sep 20, 2016 at 1:47 AM, muhammet pakyürek <mpa...@hotmail.com> wrote: > > &

Re: Is there such thing as cache fusion with the underlying tables/files on HDFS

2016-09-17 Thread Todd Nist
Hi Mich, Have you looked at Apache Ignite? https://apacheignite-fs.readme.io/docs. This looks like something that may be what your looking for: http://apacheignite.gridgain.org/docs/data-analysis-with-apache-zeppelin HTH. -Todd On Sat, Sep 17, 2016 at 12:53 PM, Mich Talebzadeh

Re: Creating HiveContext withing Spark streaming

2016-09-08 Thread Todd Nist
sparkContext = new SparkContext(sparkConf)* val HiveContext = new HiveContext(streamingContext.sparkContext) HTH. -Todd On Thu, Sep 8, 2016 at 9:11 AM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Ok I managed to sort that one out. > > This is what I am facing > >

Re: Design patterns involving Spark

2016-08-30 Thread Todd Nist
Have not tried this, but looks quite useful if one is using Druid: https://github.com/implydata/pivot - An interactive data exploration UI for Druid On Tue, Aug 30, 2016 at 4:10 AM, Alonso Isidoro Roman wrote: > Thanks Mitch, i will check it. > > Cheers > > > Alonso

Re: Writing to Hbase table from Spark

2016-08-30 Thread Todd Nist
Have you looked at spark-packges.org? There are several different HBase connectors there, not sure if any meet you need or not. https://spark-packages.org/?q=hbase HTH, -Todd On Tue, Aug 30, 2016 at 5:23 AM, ayan guha <guha.a...@gmail.com> wrote: > You can use rdd level new hado

Spark Job Doesn't End on Mesos

2016-08-09 Thread Todd Leo
-4372-0034' However, the process doesn’t quit after all. This is critical, because I’d like to use SparkLauncher to submit such jobs. If my job doesn’t end, jobs will pile up and fill up the memory. Pls help. :-| — BR, Todd Leo ​

Re: HiveThriftServer2.startWithContext no more showing tables in 1.6.2

2016-07-21 Thread Todd Nist
-cant-find-my-tables-in-spark-sql-using-beeline.html HTH. -Todd On Thu, Jul 21, 2016 at 10:30 AM, Marco Colombo <ing.marco.colo...@gmail.com > wrote: > Thanks. > > That is just a typo. I'm using on 'spark://10.0.2.15:7077' (standalone). > Same url used in --master in spark-submit

Re: Load selected rows with sqlContext in the dataframe

2016-07-21 Thread Todd Nist
You can set the dbtable to this: .option("dbtable", "(select * from master_schema where 'TID' = '100_0')") HTH, Todd On Thu, Jul 21, 2016 at 10:59 AM, sujeet jog <sujeet@gmail.com> wrote: > I have a table of size 5GB, and want to load selective rows into d

Re: Apache Spark Kafka Integration - org.apache.spark.SparkException: Couldn't find leader offsets for Set()

2016-06-07 Thread Todd Nist
Streaming within its checkpoints by default. You can also manage them yourself if desired. How are you dealing with offsets ? Can you verify the offsets on the broker: kafka-run-class.sh kafka.tools.GetOffsetShell --topic --broker-list --time -1 -Todd On Tue, Jun 7, 2016 at 8:17 AM, Dominik

Re: Apache Spark Kafka Integration - org.apache.spark.SparkException: Couldn't find leader offsets for Set()

2016-06-07 Thread Todd Nist
What version of Spark are you using? I do not believe that 1.6.x is compatible with 0.9.0.1 due to changes in the kafka clients between 0.8.2.2 and 0.9.0.x. See this for more information: https://issues.apache.org/jira/browse/SPARK-12177 -Todd On Tue, Jun 7, 2016 at 7:35 AM, Dominik Safaric

Re:how to config spark thrift jdbc server high available

2016-05-23 Thread Todd
There is a jira that works on spark thrift server HA, the patch works,but still hasn't merged into the master branch. At 2016-05-23 20:10:26, "qmzhang" <578967...@qq.com> wrote: >Dear guys, please help... > >In hive,we can enable hiveserver2 high available by using dynamic service

Re:why spark 1.6 use Netty instead of Akka?

2016-05-23 Thread Todd
As far as I know, there would be Akka version conflicting issue when using Akka as spark streaming source. At 2016-05-23 21:19:08, "Chaoqiang" wrote: >I want to know why spark 1.6 use Netty instead of Akka? Is there some >difficult problems which Akka can not

Re:Re: How spark depends on Guava

2016-05-23 Thread Todd
r" <m...@schaffer.me> wrote: I got curious so I tried sbt dependencyTree. Looks like Guava comes into spark core from a couple places. -Mat matschaffer.com On Mon, May 23, 2016 at 2:32 PM, Todd <bit1...@163.com> wrote: Can someone please take alook at my question?I am

Re:How spark depends on Guava

2016-05-22 Thread Todd
Can someone please take alook at my question?I am spark-shell local mode and yarn-client mode.Spark code uses guava library,spark should have guava in place during run time. Thanks. At 2016-05-23 11:48:58, "Todd" <bit1...@163.com> wrote: Hi, In the spark code, guava

How spark depends on Guava

2016-05-22 Thread Todd
Hi, In the spark code, guava maven dependency scope is provided, my question is, how spark depends on guava during runtime? I looked into the spark-assembly-1.6.1-hadoop2.6.1.jar,and didn't find class entries like com.google.common.base.Preconditions etc...

Does spark support Apache Arrow

2016-05-19 Thread Todd
From the official site http://arrow.apache.org/, Apache Arrow is used for Columnar In-Memory storage. I have two quick questions: 1. Does spark support Apache Arrow? 2. When dataframe is cached in memory, the data are saved in columnar in-memory style. What is the relationship between this

Does Structured Streaming support Kafka as data source?

2016-05-18 Thread Todd
Hi, I brief the spark code, and it looks that structured streaming doesn't support kafka as data source yet?

Re: Unit testing framework for Spark Jobs?

2016-05-18 Thread Todd Nist
Perhaps these may be of some use: https://github.com/mkuthan/example-spark http://mkuthan.github.io/blog/2015/03/01/spark-unit-testing/ https://github.com/holdenk/spark-testing-base On Wed, May 18, 2016 at 2:14 PM, swetha kasireddy wrote: > Hi Lars, > > Do you have

Re:Re: Re: How to change output mode to Update

2016-05-17 Thread Todd
s queries // outputMode() is used for continuous queries assertNotStreaming("mode() can only be called on non-continuous queries") this.mode = saveMode this } On Wed, May 18, 2016 at 12:25 PM, Todd <bit1...@163.com> wrote: Thanks Ted. I didn't try, but I think SaveMode and OuputM

How to change output mode to Update

2016-05-17 Thread Todd
scala> records.groupBy("name").count().write.trigger(ProcessingTime("30 seconds")).option("checkpointLocation", "file:///home/hadoop/jsoncheckpoint").startStream("file:///home/hadoop/jsonresult") org.apache.spark.sql.AnalysisException: Aggregations are not supported on streaming

How to use Kafka as data source for Structured Streaming

2016-05-17 Thread Todd
Hi, I am wondering whether structured streaming supports Kafka as data source. I brief the source code(meanly related with DataSourceRegister trait), and didn't find kafka data source things If Thanks.

Re:Re: Does Structured Streaming support count(distinct) over all the streaming data?

2016-05-17 Thread Todd
ByValueAndWindow(Seconds(windowLength), Seconds(slidingInterval)) HTH Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com On 17 May 2016 at 20:02, Michael Armbrust <mich...@databricks.

Does Structured Streaming support count(distinct) over all the streaming data?

2016-05-17 Thread Todd
Hi, We have a requirement to do count(distinct) in a processing batch against all the streaming data(eg, last 24 hours' data),that is,when we do count(distinct),we actually want to compute distinct against last 24 hours' data. Does structured streaming support this scenario?Thanks!

Re:Re: Code Example of Structured Streaming of 2.0

2016-05-17 Thread Todd
Thanks Ted! At 2016-05-17 16:16:09, "Ted Yu" <yuzhih...@gmail.com> wrote: Please take a look at: [SPARK-13146][SQL] Management API for continuous queries [SPARK-14555] Second cut of Python API for Structured Streaming On Mon, May 16, 2016 at 11:46 PM, Todd <bit1...@

Code Example of Structured Streaming of 2.0

2016-05-17 Thread Todd
Hi, Are there code examples about how to use the structured streaming feature? Thanks.

Re: Spark SQL Transaction

2016-04-23 Thread Todd Nist
, it will issue the commit: if (supportsTransactions) { conn.commit() } HTH -Todd On Sat, Apr 23, 2016 at 8:57 AM, Andrés Ivaldi <iaiva...@gmail.com> wrote: > Hello, so I executed Profiler and found that implicit isolation was turn > on by JDBC driver, this is the default behavior of MSSQL

Re: Spark 1.6.1. How to prevent serialization of KafkaProducer

2016-04-21 Thread Todd Nist
Have you looked at these: http://allegro.tech/2015/08/spark-kafka-integration.html http://mkuthan.github.io/blog/2016/01/29/spark-kafka-integration2/ Full example here: https://github.com/mkuthan/example-spark-kafka HTH. -Todd On Thu, Apr 21, 2016 at 2:08 PM, Alexander Gallego <ag

Re: How to change akka.remote.startup-timeout in spark

2016-04-21 Thread Todd Nist
I believe you can adjust it by setting the following: spark.akka.timeout 100s Communication timeout between Spark nodes. HTH. -Todd On Thu, Apr 21, 2016 at 9:49 AM, yuemeng (A) <yueme...@huawei.com> wrote: > When I run a spark application,sometimes I get follow ERROR: > > 16

Re: Apache Flink

2016-04-17 Thread Todd Nist
e as complex event processing > engine. https://stratio.atlassian.net/wiki/display/DECISION0x9/Home I have not used it, only read about it but it may be of some interest to you. -Todd On Sun, Apr 17, 2016 at 5:49 PM, Peyman Mohajerian <mohaj...@gmail.com> wrote: > Microbatching

What's the benifit of RDD checkpoint against RDD save

2016-03-23 Thread Todd
Hi, I have a long computing chain, when I get the last RDD after a series of transformation. I have two choices to do with this last RDD 1. Call checkpoint on RDD to materialize it to disk 2. Call RDD.saveXXX to save it to HDFS, and read it back for further processing I would ask which choice

Re: "bootstrapping" DStream state

2016-03-10 Thread Todd Nist
n / interval)) val counts = eventsStream.map(event => { (event.timestamp - event.timestamp % interval, event) }).updateStateByKey[Long](PrintEventCountsByInterval.counter _, new HashPartitioner(3), initialRDD = initialRDD) counts.print() HTH. -Todd On Thu, Mar 10, 2016 at 1:35 AM, Zalzber

Re: Spark Streaming, very slow processing and increasing scheduling delay of kafka input stream

2016-03-10 Thread Todd Nist
(KafkaUtils.createDirectStream) or Receiver (KafkaUtils.createStream)? You may find this discussion of value on SO: http://stackoverflow.com/questions/28901123/org-apache-spark-shuffle-metadatafetchfailedexception-missing-an-output-locatio -Todd On Mon, Mar 7, 2016 at 5:52 PM, Vinti Maheshwari <vint

Re: Building a REST Service with Spark back-end

2016-03-02 Thread Todd Nist
-apache-spark/ Not sure if that is of value to you or not. HTH. -Todd On Tue, Mar 1, 2016 at 7:30 PM, Don Drake <dondr...@gmail.com> wrote: > I'm interested in building a REST service that utilizes a Spark SQL > Context to return records from a DataFrame (or IndexedRDD?) and even

Re: Spark for client

2016-03-01 Thread Todd Nist
ZendCon 2014 *->* IPython notebook <https://www.youtube.com/watch?v=2AX6g0tK-us=youtu.be=37m42s> running the Spark Kernel underneath HTH. Todd On Tue, Mar 1, 2016 at 4:10 AM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Thanks Mohannad. > > Installed Anaconda 3 that cont

Re: Spark Integration Patterns

2016-02-28 Thread Todd Nist
cluster ? > Am I missing something obvious ? > > > Le dim. 28 févr. 2016 à 19:01, Todd Nist <tsind...@gmail.com> a écrit : > >> Define your SparkConfig to set the master: >> >> val conf = new SparkConf().setAppName(AppName) >> .setMaster(SparkMaster)

Re: Spark Integration Patterns

2016-02-28 Thread Todd Nist
7". Then when you create the SparkContext, pass the SparkConf to it: val sparkContext = new SparkContext(conf) Then use the sparkContext for interact with the SparkMaster / Cluster. Your program basically becomes the driver. HTH. -Todd On Sun, Feb 28, 2016 at 9:25 AM, mms <moshir.

Re: Saving Kafka Offsets to Cassandra at begining of each batch in Spark Streaming

2016-02-16 Thread Todd Nist
You could use the "withSessionDo" of the SparkCassandrConnector to preform the simple insert: CassandraConnector(conf).withSessionDo { session => session.execute(....) } -Todd On Tue, Feb 16, 2016 at 11:01 AM, Cody Koeninger <c...@koeninger.org> wrote: > You co

Re:Hive on Spark knobs

2016-01-28 Thread Todd
Did you run hive on spark with spark 1.5 and hive 1.1? I think hive on spark doesn't support spark 1.5. There are compatibility issues. At 2016-01-28 01:51:43, "Ruslan Dautkhanov" wrote: https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started

Compile error when compiling spark 2.0.0 snapshot code base in IDEA

2016-01-27 Thread Todd
Hi, I am able to maven install the whole spark project(from github ) in my IDEA. But, when I run the SparkPi example, IDEA compiles the code again and following exeception is thrown, Does someone meet this problem? Thanks a lot. Error:scalac: while compiling:

How data locality is honored when spark is running on yarn

2016-01-27 Thread Todd
Hi, I am kind of confused about how data locality is honored when spark is running on yarn(client or cluster mode),can someone please elaberate on this? Thanks!

Re: Passing binding variable in query used in Data Source API

2016-01-21 Thread Todd Nist
.option("user", username) .option("password", pwd) .option("driver", "org.postgresql.Driver") .option("dbtable", "schema.table1") .load().filter('dept_number === $deptNo) This is form the top of my head and the code has not be

Re: NPE when using Joda DateTime

2016-01-14 Thread Todd Nist
xtends KryoRegistrator { override def registerClasses(kryo: Kryo) { kryo.register(classOf[org.joda.time.DateTime], new JodaDateTimeSerializer) kryo.register(classOf[org.joda.time.Interval], new JodaIntervalSerializer) } } HTH. -Todd On Thu, Jan 14, 2016 at 9:28 AM, Spencer, Alex (Santander

Re: GroupBy on DataFrame taking too much time

2016-01-11 Thread Todd Nist
) .option("user", username) .option("password", pwd) .option("driver", "driverClassNameHere") .option("dbtable", query) .load() Not sure if that's what your looking for or not. HTH. -Todd On Mon, Jan 11, 2016 at 3:47 AM, Gaini Rajeshwar

Re: write new data to mysql

2016-01-08 Thread Todd Nist
Sorry, did not see your update until now. On Fri, Jan 8, 2016 at 3:52 PM, Todd Nist <tsind...@gmail.com> wrote: > Hi Yasemin, > > What version of Spark are you using? Here is the reference, it is off of > the DataFrame > https://spark.apache.org/docs/lates

Re: write new data to mysql

2016-01-08 Thread Todd Nist
park.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrame.html> out into external storage. It is the very last method defined there in the api docs. HTH. -Todd On Fri, Jan 8, 2016 at 2:27 PM, Yasemin Kaya <godo...@gmail.com> wrote: > Hi, > There is no write function

Re: write new data to mysql

2016-01-08 Thread Todd Nist
(MYSQL_CONNECTION_URL_WRITE, "track_on_alarm", connectionProps) HTH. -Todd On Fri, Jan 8, 2016 at 10:53 AM, Ted Yu <yuzhih...@gmail.com> wrote: > Which Spark release are you using ? > > For case #2, was there any error / clue in the logs ? > > Cheers > > On Fri, Jan 8

Re: problem building spark on centos

2016-01-06 Thread Todd Nist
That should read "I think your missing the --name option". Sorry about that. On Wed, Jan 6, 2016 at 3:03 PM, Todd Nist <tsind...@gmail.com> wrote: > Hi Jade, > > I think you "--name" option. The makedistribution should look like this: > > ./make-distr

Re: problem building spark on centos

2016-01-06 Thread Todd Nist
Tests HTH. -Todd On Wed, Jan 6, 2016 at 2:20 PM, Jade Liu <jade@nor1.com> wrote: > I’ve changed the scala version to 2.10. > > With this command: > build/mvn -X -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean > package > Build was successful. > &

Re: problem building spark on centos

2016-01-06 Thread Todd Nist
: "10.10.5", arch: "x86_64", family: "mac" On Wed, Jan 6, 2016 at 3:27 PM, Jade Liu <jade@nor1.com> wrote: > Hi, Todd: > > Thanks for your suggestion. Yes I did run the > ./dev/change-scala-version.sh 2.11 script when using scala version 2.11. > &

Re: looking for a easier way to count the number of items in a JavaDStream

2015-12-16 Thread Todd Nist
any collects(), just to obtain the count of records on the DStream. HTH. -Todd On Wed, Dec 16, 2015 at 3:34 PM, Bryan Cutler <cutl...@gmail.com> wrote: > To follow up with your other issue, if you are just trying to count > elements in a DStream, you can do that without an Acc

Re: Securing objects on the thrift server

2015-12-15 Thread Todd Nist
see https://issues.apache.org/jira/browse/SPARK-11043, it is resolved in 1.6. On Tue, Dec 15, 2015 at 2:28 PM, Younes Naguib < younes.nag...@tritondigital.com> wrote: > The one coming with spark 1.5.2. > > > > y > > > > *From:* Ted Yu [mailto:yuzhih...@gmail.com] > *Sent:* December-15-15 1:59 PM

Re: Questions on Kerberos usage with YARN and JDBC

2015-12-11 Thread Todd Simmer
in Windows DNS and what it's pointing at. Can you do a kinit *username * on that host? It should tell you if it can find the KDC. Let me know if that's helpful at all. Todd On Fri, Dec 11, 2015 at 1:50 PM, Mike Wright <mwri...@snl.com> wrote: > As part of our implementation, we are

Re: [Spark Streaming] How to clear old data from Stream State?

2015-11-25 Thread Todd Nist
the idle state as being timed out, and call the tracking * function with State[S].isTimingOut() = true. */ def timeout(duration: Duration): this.type -Todd On Wed, Nov 25, 2015 at 8:00 AM, diplomatic Guru <diplomaticg...@gmail.com> wrote: > Hello, > > I know how I coul

Re: Spark Driver Port Details

2015-11-25 Thread Todd Nist
this: val conf = new SparkConf().setAppName(s"YourApp").set("spark.ui.port", "4080") val sc = new SparkContext(conf) While there is a rest api to return you information on the application, http://yourserver:8080/api/v1/applications, it does not return the port used by t

Re: Getting the batch time of the active batches in spark streaming

2015-11-24 Thread Todd Nist
/SparkListener.html . HTH, -Todd On Tue, Nov 24, 2015 at 4:50 PM, Abhishek Anand <abhis.anan...@gmail.com> wrote: > Hi , > > I need to get the batch time of the active batches which appears on the UI > of spark streaming tab, > > How can this be achieved in Java ? > > BR, > Abhi >

Re: Getting the batch time of the active batches in spark streaming

2015-11-24 Thread Todd Nist
(StreamingListenerBatchSubmitted batchSubmitted) { system.out.println("Start time: " + batchSubmitted.batchInfo.processingStartTime) } Sorry for the confusion. -Todd On Tue, Nov 24, 2015 at 7:51 PM, Todd Nist <tsind...@gmail.com> wrote: > Hi Abhi, > > You s

How 'select name,age from TBL_STUDENT where age = 37' is optimized when caching it

2015-11-16 Thread Todd
Hi, When I cache the dataframe and run the query, val df = sqlContext.sql("select name,age from TBL_STUDENT where age = 37") df.cache() df.show println(df.queryExecution) I got the following execution plan,from the optimized logical plan,I can see the whole analyzed logical

How to use --principal and --keytab in SparkSubmit

2015-11-08 Thread Todd
Hi, I am staring spark thrift server with the following script, ./start-thriftserver.sh --master yarn-client --driver-memory 1G --executor-memory 2G --driver-cores 2 --executor-cores 2 --num-executors 4 --hiveconf hive.server2.thrift.port=10001 --hiveconf

[Spark R]could not allocate memory (2048 Mb) in C function 'R_AllocStringBuffer'

2015-11-06 Thread Todd
I am launching spark R with following script: ./sparkR --driver-memory 12G and I try to load a local 3G csv file with following code, > a=read.transactions("/home/admin/datamining/data.csv",sep="\t",format="single",cols=c(1,2)) but I encounter an error: could not allocate memory (2048 Mb) in

Required file not found: sbt-interface.jar

2015-11-02 Thread Todd
Hi, I am trying to build spark 1.5.1 in my environment, but encounter the following error complaining Required file not found: sbt-interface.jar: The error message is below and I am building with: ./make-distribution.sh --name spark-1.5.1-bin-2.6.0 --tgz --with-tachyon -Phadoop-2.6

Re: Maven build failed (Spark master)

2015-10-27 Thread Todd Nist
. FWIW, the environment was an MBP with OS X 10.10.5 and Java: java version "1.8.0_51" Java(TM) SE Runtime Environment (build 1.8.0_51-b16) Java HotSpot(TM) 64-Bit Server VM (build 25.51-b03, mixed mode) -Todd On Tue, Oct 27, 2015 at 12:17 PM, Ted Yu <yuzhih...@gmail.com>

Re: Newbie Help for spark compilation problem

2015-10-25 Thread Todd Nist
state, is there a provided 2.11 tgz available as well? I did not think there was, if there is then should the documentation on the download site be changed to reflect this? Sorry for the confusion. -Todd On Sun, Oct 25, 2015 at 4:07 PM, Sean Owen <so...@cloudera.com> wrote: > No,

Re: Newbie Help for spark compilation problem

2015-10-25 Thread Todd Nist
. There are some limitations see this, http://spark.apache.org/docs/latest/building-spark.html#building-for-scala-211, for what is not supported. HTH, -Todd On Sun, Oct 25, 2015 at 10:56 AM, Bilinmek Istemiyor <benibi...@gmail.com> wrote: > > I am just starting out apache spark. I hava zero kno

Re: Newbie Help for spark compilation problem

2015-10-25 Thread Todd Nist
Sorry Sean you are absolutely right it supports 2.11 all o meant is there is no release available as a standard download and that one has to build it. Thanks for the clairification. -Todd On Sunday, October 25, 2015, Sean Owen <so...@cloudera.com> wrote: > Hm, why do you say it doesn'

Re: java.lang.NegativeArraySizeException? as iterating a big RDD

2015-10-23 Thread Todd Nist
you attempt to serialize. Increase this if you get a "buffer limit exceeded" exception inside Kryo. -Todd On Fri, Oct 23, 2015 at 6:51 AM, Yifan LI <iamyifa...@gmail.com> wrote: > Thanks for your advice, Jem. :) > > I will increase the partitioning and see if it he

Re: Spark SQL Thriftserver and Hive UDF in Production

2015-10-19 Thread Todd Nist
>From tableau, you should be able to use the Initial SQL option to support this: So in Tableau add the following to the “Initial SQL” create function myfunc AS 'myclass' using jar 'hdfs:///path/to/jar'; HTH, Todd On Mon, Oct 19, 2015 at 11:22 AM, Deenar Toraskar <deenar.toras...@gma

Re: KafkaProducer using Cassandra as source

2015-09-23 Thread Todd Nist
Hi Kali, If you do not mind sending JSON, you could do something like this, using json4s: val rows = p.collect() map ( row => TestTable(row.getString(0), row.getString(1)) ) val json = parse(write(rows)) producer.send(new KeyedMessage[String, String]("trade", writePretty(json))) // or for

Re: Replacing Esper with Spark Streaming?

2015-09-14 Thread Todd Nist
Stratio offers a CEP implementation based on Spark Streaming and the Siddhi CEP engine. I have not used the below, but they may be of some value to you: http://stratio.github.io/streaming-cep-engine/ https://github.com/Stratio/streaming-cep-engine HTH. -Todd On Sun, Sep 13, 2015 at 7:49 PM

Re:Re: RE: Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-13 Thread Todd
t;> code generation could introduce slowness >> >> >> 在2015年09月11日 15:58,Cheng, Hao 写道: >> >> Can you confirm if the query really run in the cluster mode? Not the local >> mode. Can you print the call stack of the executor when the query is running? &g

Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-11 Thread Todd
e;” in Spark 1.5, and run the query again? In our previous testing, it’s about 20% slower for sort merge join. I am not sure if there anything else slow down the performance. Hao From: Jesse F Chen [mailto:jfc...@us.ibm.com] Sent: Friday, September 11, 2015 1:18 PM To: Michael Armbrus

Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-11 Thread Todd
5, and it’s true by default, but we found it probably causes the performance reduce dramatically. From: Todd [mailto:bit1...@163.com] Sent: Friday, September 11, 2015 2:17 PM To: Cheng, Hao Cc: Jesse F Chen; Michael Armbrust; user@spark.apache.org Subject: Re:RE: spark 1.5 SQL slows down dr

Re:Re:RE: Re:RE: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-11 Thread Todd
',there is no table to show queries and execution plan information. At 2015-09-11 14:39:06, "Todd" <bit1...@163.com> wrote: Thanks Hao. Yes,it is still low as SMJ。Let me try the option your suggested, At 2015-09-11 14:34:46, "Cheng, Hao" <hao.ch...@intel.com> w

spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-10 Thread Todd
Hi, I am using data generated with sparksqlperf(https://github.com/databricks/spark-sql-perf) to test the spark sql performance (spark on yarn, with 10 nodes) with the following code (The table store_sales is about 90 million records, 6G in size) val

Re: Tungsten and Spark Streaming

2015-09-10 Thread Todd Nist
https://issues.apache.org/jira/browse/SPARK-8360?jql=project%20%3D%20SPARK%20AND%20text%20~%20Streaming -Todd On Thu, Sep 10, 2015 at 10:22 AM, Gurvinder Singh < gurvinder.si...@uninett.no> wrote: > On 09/10/2015 07:42 AM, Tathagata Das wrote: > > Rewriting is necessary

Re:Re: spark 1.5 SQL slows down dramatically by 50%+ compared with spark 1.4.1 SQL

2015-09-10 Thread Todd
oth runs would be helpful whenever reporting performance changes. On Thu, Sep 10, 2015 at 1:24 AM, Todd <bit1...@163.com> wrote: Hi, I am using data generated with sparksqlperf(https://github.com/databricks/spark-sql-perf) to test the spark sql performance (spark on yarn, with 10 nodes

BlockNotFoundException when running spark word count on Tachyon

2015-08-26 Thread Todd
I am using tachyon in the spark program below,but I encounter a BlockNotFoundxception. Does someone know what's wrong and also is there guide on how to configure spark to work with Tackyon?Thanks! conf.set(spark.externalBlockStore.url, tachyon://10.18.19.33:19998)

Re:Re:Re: How to increase data scale in Spark SQL Perf

2015-08-26 Thread Todd
Sorry for the noise, It's my bad...I have worked it out now. At 2015-08-26 13:20:57, Todd bit1...@163.com wrote: I think the answer is No. I only see such message on the console..and #2 is the thread stack trace。 I am thinking is that in Spark SQL Perf forks many dsdgen process to generate

Re:Re: How to increase data scale in Spark SQL Perf

2015-08-26 Thread Todd
Increase the number of executors, :-) At 2015-08-26 16:57:48, Ted Yu yuzhih...@gmail.com wrote: Mind sharing how you fixed the issue ? Cheers On Aug 26, 2015, at 1:50 AM, Todd bit1...@163.com wrote: Sorry for the noise, It's my bad...I have worked it out now. At 2015-08-26 13:20:57

Re:Re: How to increase data scale in Spark SQL Perf

2015-08-25 Thread Todd
. Are you able to get more detailed error message ? Thanks On Aug 25, 2015, at 6:57 PM, Todd bit1...@163.com wrote: Thanks Ted Yu. Following are the error message: 1. The exception that is shown on the UI is : Exception in thread Thread-113 Exception in thread Thread-126 Exception

Re:Re: Exception throws when running spark pi in Intellij Idea that scala.collection.Seq is not found

2015-08-25 Thread Todd
to understand more about scope of modules: https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html On Tue, Aug 25, 2015 at 12:18 PM, Todd bit1...@163.com wrote: I cloned the code from https://github.com/apache/spark to my machine. It can compile successfully

Re:RE: Test case for the spark sql catalyst

2015-08-25 Thread Todd
Thanks Chenghao! At 2015-08-25 13:06:40, Cheng, Hao hao.ch...@intel.com wrote: Yes, check the source code under:https://github.com/apache/spark/tree/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst From: Todd [mailto:bit1...@163.com] Sent: Tuesday, August 25, 2015 1:01

Exception throws when running spark pi in Intellij Idea that scala.collection.Seq is not found

2015-08-25 Thread Todd
I cloned the code from https://github.com/apache/spark to my machine. It can compile successfully, But when I run the sparkpi, it throws an exception below complaining the scala.collection.Seq is not found. I have installed scala2.10.4 in my machine, and use the default profiles:

Re:Re: What does Attribute and AttributeReference mean in Spark SQL

2015-08-25 Thread Todd
:13 PM, Todd bit1...@163.com wrote: There are many such kind of case class or concept such as Attribute/AttributeReference/Expression in Spark SQL I would ask what Attribute/AttributeReference/Expression mean, given a sql query like select a,b from c, it a, b are two Attributes? a + b

How to increase data scale in Spark SQL Perf

2015-08-25 Thread Todd
Hi, The spark sql perf itself contains benchmark data generation. I am using spark shell to run the spark sql perf to generate the data with 10G memory for both driver and executor. When I increase the scalefactor to be 30,and run the job, Then I got the following error: When I jstack it to

Re:Re: How to increase data scale in Spark SQL Perf

2015-08-25 Thread Todd
- or paste error in text. Cheers On Tue, Aug 25, 2015 at 4:22 AM, Todd bit1...@163.com wrote: Hi, The spark sql perf itself contains benchmark data generation. I am using spark shell to run the spark sql perf to generate the data with 10G memory for both driver and executor. When I increase

What does Attribute and AttributeReference mean in Spark SQL

2015-08-24 Thread Todd
There are many such kind of case class or concept such as Attribute/AttributeReference/Expression in Spark SQL I would ask what Attribute/AttributeReference/Expression mean, given a sql query like select a,b from c, it a, b are two Attributes? a + b is an expression? Looks I misunderstand it

Test case for the spark sql catalyst

2015-08-24 Thread Todd
Hi, Are there test cases for the spark sql catalyst, such as testing the rules of transforming unsolved query plan? Thanks!

Re:SPARK sql :Need JSON back isntead of roq

2015-08-21 Thread Todd
please try DataFrame.toJSON, it will give you an RDD of JSON string. At 2015-08-21 15:59:43, smagadi sudhindramag...@fico.com wrote: val teenagers = sqlContext.sql(SELECT name FROM people WHERE age = 13 AND age = 19) I need teenagers to be a JSON object rather a simple row .How can we get

blogs/articles/videos on how to analyse spark performance

2015-08-19 Thread Todd
Hi, I would ask if there are some blogs/articles/videos on how to analyse spark performance during runtime,eg, tools that can be used or something related.

Re:Re: How to automatically relaunch a Driver program after crashes?

2015-08-19 Thread Todd
? Is there a way to auto relaunch if driver runs as a Hadoop Yarn Application? On Wednesday, 19 August 2015 12:49 PM, Todd bit1...@163.com wrote: There is an option for the spark-submit (Spark standalone or Mesos with cluster deploy mode only) --supervise If given, restarts

Re:How to automatically relaunch a Driver program after crashes?

2015-08-19 Thread Todd
There is an option for the spark-submit (Spark standalone or Mesos with cluster deploy mode only) --supervise If given, restarts the driver on failure. At 2015-08-19 14:55:39, Spark Enthusiast sparkenthusi...@yahoo.in wrote: Folks, As I see, the Driver program is a

Does spark sql support column indexing

2015-08-19 Thread Todd
I don't find related talk on whether spark sql supports column indexing. If it does, is there guide how to do it? Thanks.

Why there are overlapping for tasks on the EventTimeline UI

2015-08-18 Thread Todd
Hi, Following is copied from the spark EventTimeline UI. I don't understand why there are overlapping between tasks? I think they should be sequentially one by one in one executor(there are one core each executor). The blue part of each task is the scheduler delay time. Does it mean it is the

  1   2   3   >