Re: Link existing Hive to Spark

2015-02-06 Thread Todd Nist
Hi Ashu, Per the documents: Configuration of Hive is done by placing your hive-site.xml file in conf/. For example, you can place a something like this in your $SPARK_HOME/conf/hive-site.xml file: configuration property namehive.metastore.uris/name *!-- Ensure that the following statement

Re: SparkSQL + Tableau Connector

2015-02-10 Thread Todd Nist
; NULLMichael 30 Andy 19 Justin NULLMichael 30 Andy 19 Justin Time taken: 0.576 seconds From: Todd Nist Date: Tuesday, February 10, 2015 at 6:49 PM To: Silvio Fiorito Cc: user@spark.apache.org Subject: Re: SparkSQL + Tableau Connector Hi Silvio, Ah, I like

Re: SparkSQL + Tableau Connector

2015-02-11 Thread Todd Nist
11, 2015 at 3:59 PM, Todd Nist tsind...@gmail.com wrote: Hi Arush, So yes I want to create the tables through Spark SQL. I have placed the hive-site.xml file inside of the $SPARK_HOME/conf directory I thought that was all I should need to do to have the thriftserver use it. Perhaps my hive

Re: Is it possible to expose SchemaRDD’s from thrift server?

2015-02-12 Thread Todd Nist
.html On Thu, Feb 12, 2015 at 7:24 AM, Todd Nist tsind...@gmail.com wrote: I have a question with regards to accessing SchemaRDD’s and Spark SQL temp tables via the thrift server. It appears that a SchemaRDD when created is only available in the local namespace / context and are unavailable

Re: No suitable driver found error, Create table in hive from spark sql

2015-02-19 Thread Todd Nist
Hi Dhimant, I believe if you change your spark-shell to pass -driver-class-path /usr/local/spark/lib/mysql-connector-java-5.1.34-bin.jar vs putting it in --jars. -Todd On Wed, Feb 18, 2015 at 10:41 PM, Dhimant dhimant84.jays...@gmail.com wrote: Found solution from one of the post found on

Re: Tableau beta connector

2015-02-19 Thread Todd Nist
I am able to connect by doing the following using the Tableau Initial SQL and a custom query: 1. First ingest csv file or json and save out to file system: import org.apache.spark.sql.SQLContext import com.databricks.spark.csv._ val sqlContext = new SQLContext(sc) val demo =

Re: Where to look for potential causes for Akka timeout errors in a Spark Streaming Application?

2015-02-20 Thread Todd Nist
Hi Emre, Have you tried adjusting these: .set(spark.akka.frameSize, 500).set(spark.akka.askTimeout, 30).set(spark.core.connection.ack.wait.timeout, 600) -Todd On Fri, Feb 20, 2015 at 8:14 AM, Emre Sevinc emre.sev...@gmail.com wrote: Hello, We are building a Spark Streaming application that

Re: SparkSQL + Tableau Connector

2015-02-19 Thread Todd Nist
in the schema. In that case you will either have to generate the Hive tables externally from Spark or use Spark to process the data and save them using a HiveContext. From: Todd Nist Date: Wednesday, February 11, 2015 at 7:53 PM To: Andrew Lee Cc: Arush Kharbanda, user@spark.apache.org

Re: Set EXTRA_JAR environment variable for spark-jobserver

2015-01-06 Thread Todd Nist
*@Sasi* You should be able to create a job something like this: package io.radtech.spark.jobserver import java.util.UUID import org.apache.spark.{ SparkConf, SparkContext } import org.apache.spark.rdd.RDD import org.joda.time.DateTime import com.datastax.spark.connector.types.TypeConverter

Re: SparkSQL + Tableau Connector

2015-02-11 Thread Todd Nist
using --files hive-site.xml. similarly you can specify the same metastore to your spark-submit or sharp-shell using the same option. On Wed, Feb 11, 2015 at 5:23 AM, Todd Nist tsind...@gmail.com wrote: Arush, As for #2 do you mean something like this from the docs: // sc is an existing

SparkSQL + Tableau Connector

2015-02-10 Thread Todd Nist
Hi, I'm trying to understand how and what the Tableau connector to SparkSQL is able to access. My understanding is it needs to connect to the thriftserver and I am not sure how or if it exposes parquet, json, schemaRDDs, or does it only expose schemas defined in the metastore / hive. For

Is it possible to expose SchemaRDD’s from thrift server?

2015-02-12 Thread Todd Nist
I have a question with regards to accessing SchemaRDD’s and Spark SQL temp tables via the thrift server. It appears that a SchemaRDD when created is only available in the local namespace / context and are unavailable to external services accessing Spark through thrift server via ODBC; is this

Re: Unable to query hive tables from spark

2015-02-15 Thread Todd Nist
What does your hive-site.xml look like? Do you actually have a directory at the location shown in the error? i.e does /user/hive/warehouse/src exist? You should be able to override this by specifying the following: --hiveconf hive.metastore.warehouse.dir=/location/where/your/warehouse/exists

Re: SparkSQL + Tableau Connector

2015-02-10 Thread Todd Nist
/resources/kv1.txt' INTO TABLE src) // Queries are expressed in HiveQLsqlContext.sql(FROM src SELECT key, value).collect().foreach(println) Or did you have something else in mind? -Todd On Tue, Feb 10, 2015 at 6:35 PM, Todd Nist tsind...@gmail.com wrote: Arush, Thank you will take a look

Re: SparkSQL + Tableau Connector

2015-02-10 Thread Todd Nist
fashion, sort of related to question 2 you would need to configure thrift to read from the metastore you expect it read from - by default it reads from metastore_db directory present in the directory used to launch the thrift server. On 11 Feb 2015 01:35, Todd Nist tsind...@gmail.com wrote: Hi

Re: SparkSQL + Tableau Connector

2015-02-10 Thread Todd Nist
users using org.apache.spark.sql.parquet options (path 'examples/src/main/resources/users.parquet’) cache table users From: Todd Nist Date: Tuesday, February 10, 2015 at 3:03 PM To: user@spark.apache.org Subject: SparkSQL + Tableau Connector Hi, I'm trying to understand how and what

Re: HDP 2.2 AM abort : Unable to find ExecutorLauncher class

2015-03-16 Thread Todd Nist
Hi Bharath, I ran into the same issue a few days ago, here is a link to a post on Horton's fourm. http://hortonworks.com/community/forums/search/spark+1.2.1/ Incase anyone else needs to perform this these are the steps I took to get it to work with Spark 1.2.1 as well as Spark 1.3.0-RC3: 1.

Re: HDP 2.2 AM abort : Unable to find ExecutorLauncher class

2015-03-17 Thread Todd Nist
in the yarn cluster? I'd assume that the latter shouldn't be necessary. On Mon, Mar 16, 2015 at 8:38 PM, Todd Nist tsind...@gmail.com wrote: Hi Bharath, I ran into the same issue a few days ago, here is a link to a post on Horton's fourm. http://hortonworks.com/community/forums/search

Re: [SQL] Elasticsearch-hadoop, exception creating temporary table

2015-03-18 Thread Todd Nist
problem first? *From:* Todd Nist [mailto:tsind...@gmail.com] *Sent:* Thursday, March 19, 2015 7:49 AM *To:* user@spark.apache.org *Subject:* [SQL] Elasticsearch-hadoop, exception creating temporary table I am attempting to access ElasticSearch and expose it’s data through SparkSQL using

[SQL] Elasticsearch-hadoop, exception creating temporary table

2015-03-18 Thread Todd Nist
I am attempting to access ElasticSearch and expose it’s data through SparkSQL using the elasticsearch-hadoop project. I am encountering the following exception when trying to create a Temporary table from a resource in ElasticSearch.: 15/03/18 07:54:46 INFO DAGScheduler: Job 2 finished: runJob

[Spark SQL] Elasticsearch-hadoop - exception when creating Temporary table

2015-03-18 Thread Todd Nist
I am attempting to access ElasticSearch and expose it’s data through SparkSQL using the elasticsearch-hadoop project. I am encountering the following exception when trying to create a Temporary table from a resource in ElasticSearch.: 15/03/18 07:54:46 INFO DAGScheduler: Job 2 finished: runJob

Re: hbase sql query

2015-03-12 Thread Todd Nist
is also based on scala, I was looking for some help with java Apis. *Thanks,* *Udbhav Agarwal* *From:* Todd Nist [mailto:tsind...@gmail.com] *Sent:* 12 March, 2015 5:28 PM *To:* Udbhav Agarwal *Cc:* Akhil Das; user@spark.apache.org *Subject:* Re: hbase sql query Have you considered

Re: Spark Build with Hadoop 2.6, yarn - encounter java.lang.NoClassDefFoundError: org/codehaus/jackson/map/deser/std/StdDeserializer

2015-03-06 Thread Todd Nist
On Thu, Mar 5, 2015 at 10:04 AM, Todd Nist tsind...@gmail.com wrote: org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:166) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163

Re: Spark Build with Hadoop 2.6, yarn - encounter java.lang.NoClassDefFoundError: org/codehaus/jackson/map/deser/std/StdDeserializer

2015-03-06 Thread Todd Nist
failed in the first place. Thanks. Zhan Zhang On Mar 6, 2015, at 9:59 AM, Todd Nist tsind...@gmail.com wrote: First, thanks to everyone for their assistance and recommendations. @Marcelo I applied the patch that you recommended and am now able to get into the shell, thank you worked

Re: Spark Build with Hadoop 2.6, yarn - encounter java.lang.NoClassDefFoundError: org/codehaus/jackson/map/deser/std/StdDeserializer

2015-03-06 Thread Todd Nist
, at 11:40 AM, Zhan Zhang zzh...@hortonworks.com wrote: You are using 1.2.1 right? If so, please add java-opts in conf directory and give it a try. [root@c6401 conf]# more java-opts -Dhdp.version=2.2.2.0-2041 Thanks. Zhan Zhang On Mar 6, 2015, at 11:35 AM, Todd Nist tsind

Re: Visualizing the DAG of a Spark application

2015-03-13 Thread Todd Nist
There is the PR https://github.com/apache/spark/pull/2077 for doing this. On Fri, Mar 13, 2015 at 6:42 AM, t1ny wbr...@gmail.com wrote: Hi all, We are looking for a tool that would let us visualize the DAG generated by a Spark application as a simple graph. This graph would represent the

Re: hbase sql query

2015-03-12 Thread Todd Nist
Have you considered using the spark-hbase-connector for this: https://github.com/nerdammer/spark-hbase-connector On Thu, Mar 12, 2015 at 5:19 AM, Udbhav Agarwal udbhav.agar...@syncoms.com wrote: Thanks Akhil. Additionaly if we want to do sql query we need to create JavaPairRdd, then

Re: Spark as a service

2015-03-24 Thread Todd Nist
Perhaps this project, https://github.com/calrissian/spark-jetty-server, could help with your requirements. On Tue, Mar 24, 2015 at 7:12 AM, Jeffrey Jedele jeffrey.jed...@gmail.com wrote: I don't think there's are general approach to that - the usecases are just to different. If you really need

SparkSql - java.util.NoSuchElementException: key not found: node when access JSON Array

2015-03-31 Thread Todd Nist
I am accessing ElasticSearch via the elasticsearch-hadoop and attempting to expose it via SparkSQL. I am using spark 1.2.1, latest supported by elasticsearch-hadoop, and org.elasticsearch % elasticsearch-hadoop % 2.1.0.BUILD-SNAPSHOT of elasticsearch-hadoop. I’m encountering an issue when I

Re: Query REST web service with Spark?

2015-03-31 Thread Todd Nist
Here are a few ways to achieve what your loolking to do: https://github.com/cjnolet/spark-jetty-server Spark Job Server - https://github.com/spark-jobserver/spark-jobserver - defines a REST API for Spark Hue -

Re: SparkSql - java.util.NoSuchElementException: key not found: node when access JSON Array

2015-03-31 Thread Todd Nist
at 3:26 PM, Todd Nist tsind...@gmail.com wrote: I am accessing ElasticSearch via the elasticsearch-hadoop and attempting to expose it via SparkSQL. I am using spark 1.2.1, latest supported by elasticsearch-hadoop, and org.elasticsearch % elasticsearch-hadoop % 2.1.0.BUILD-SNAPSHOT of elasticsearch

Re: What joda-time dependency does spark submit use/need?

2015-02-27 Thread Todd Nist
You can specify these jars (joda-time-2.7.jar, joda-convert-1.7.jar) either as part of your build and assembly or via the --jars option to spark-submit. HTH. On Fri, Feb 27, 2015 at 2:48 PM, Su She suhsheka...@gmail.com wrote: Hello Everyone, I'm having some issues launching (non-spark)

Re: Spark Monitoring UI for Hadoop Yarn Cluster

2015-03-03 Thread Todd Nist
Hi Srini, If you start the $SPARK_HOME/sbin/start-history-server, you should be able to see the basic spark ui. You will not see the master, but you will be able to see the rest as I recall. You also need to add an entry into the spark-defaults.conf, something like this: *## Make sure the host

Spark Build with Hadoop 2.6, yarn - encounter java.lang.NoClassDefFoundError: org/codehaus/jackson/map/deser/std/StdDeserializer

2015-03-05 Thread Todd Nist
I am running Spark on a HortonWorks HDP Cluster. I have deployed there prebuilt version but it is only for Spark 1.2.0 not 1.2.1 and there are a few fixes and features in there that I would like to leverage. I just downloaded the spark-1.2.1 source and built it to support Hadoop 2.6 by doing the

Re: Spark Build with Hadoop 2.6, yarn - encounter java.lang.NoClassDefFoundError: org/codehaus/jackson/map/deser/std/StdDeserializer

2015-03-05 Thread Todd Nist
: -Djackson.version=1.9.3 Cheers On Thu, Mar 5, 2015 at 10:04 AM, Todd Nist tsind...@gmail.com wrote: I am running Spark on a HortonWorks HDP Cluster. I have deployed there prebuilt version but it is only for Spark 1.2.0 not 1.2.1 and there are a few fixes and features in there that I would like

Re: Is SPARK_CLASSPATH really deprecated?

2015-02-26 Thread Todd Nist
Hi Kannan, I believe you should be able to use the --jars for this when invoke the spark-shell or perform a spark-submit. Per docs: --jars JARSComma-separated list of local jars to include on the driver and executor classpaths. HTH. -Todd On Thu, Feb

Re: Is SPARK_CLASSPATH really deprecated?

2015-02-26 Thread Todd Nist
Hi Kannan, Issues with using --jars make sense. I believe you can set the classpath via the use the --conf spark.executor.extraClassPath= or in your driver with .set(spark.executor.extraClassPath, .) I believe you are correct with the localize as well as long as your guaranteed that all

Re: HDP 2.2 AM abort : Unable to find ExecutorLauncher class

2015-03-18 Thread Todd Nist
a deployment of the spark distribution or any other config change to support a spark job. Isn't that correct? On Tue, Mar 17, 2015 at 6:19 PM, Todd Nist tsind...@gmail.com wrote: Hi Bharath, Do you have these entries in your $SPARK_HOME/conf/spark-defaults.conf file? spark.driver.extraJavaOptions

Re: [SQL] Elasticsearch-hadoop, exception creating temporary table

2015-03-19 Thread Todd Nist
: Seems the elasticsearch-hadoop project was built with an old version of Spark, and then you upgraded the Spark version in execution env, as I know the StructField changed the definition in Spark 1.2, can you confirm the version problem first? *From:* Todd Nist [mailto:tsind...@gmail.com] *Sent

Re: Spark SQL 1.3.0 - spark-shell error : HiveMetastoreCatalog.class refers to term cache in package com.google.common which is not available

2015-04-02 Thread Todd Nist
Hi Young, Sorry for the duplicate post, want to reply to all. I just downloaded the bits prebuilt form apache spark download site. Started the spark shell and got the same error. I then started the shell as follows: ./bin/spark-shell --master spark://radtech.io:7077 --total-executor-cores 2

Re: Spark Sql - Missing Jar ? json_tuple NoClassDefFoundError

2015-04-03 Thread Todd Nist
is download location ? On Fri, Apr 3, 2015 at 3:42 PM, Todd Nist tsind...@gmail.com wrote: Started the spark shell with the one jar from hive suggested: ./bin/spark-shell --master spark://radtech.io:7077 --total-executor-cores 2 --driver-class-path /usr/local/spark/lib/mysql-connector-java

Re: Spark Sql - Missing Jar ? json_tuple NoClassDefFoundError

2015-04-03 Thread Todd Nist
definition (code) of UDF json_tuple. That should solve your problem. On Fri, Apr 3, 2015 at 3:57 PM, Todd Nist tsind...@gmail.com wrote: I placed it there. It was downloaded from MySql site. On Fri, Apr 3, 2015 at 6:25 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: Akhil you mentioned /usr/local

Re: Spark Sql - Missing Jar ? json_tuple NoClassDefFoundError

2015-04-03 Thread Todd Nist
Thanks Best Regards On Fri, Apr 3, 2015 at 2:55 PM, Todd Nist tsind...@gmail.com wrote: Hi Akhil, This is for version 1.2.1. Well the other thread that you reference was me attempting it in 1.3.0 to see if the issue was related to 1.2.1. I did not build Spark but used the version from

Re: Tableau + Spark SQL Thrift Server + Cassandra

2015-04-03 Thread Todd Nist
What version of Cassandra are you using? Are you using DSE or the stock Apache Cassandra version? I have connected it with DSE, but have not attempted it with the standard Apache Cassandra version. FWIW,

Re: Tableau + Spark SQL Thrift Server + Cassandra

2015-04-03 Thread Todd Nist
in Tableau using the ODBC driver that comes with DSE. Once you connect, Tableau allows to use C* keyspace as schema and column families as tables. Mohammed *From:* pawan kumar [mailto:pkv...@gmail.com] *Sent:* Friday, April 3, 2015 7:41 AM *To:* Todd Nist *Cc:* user@spark.apache.org; Mohammed

Re: Tableau + Spark SQL Thrift Server + Cassandra

2015-04-03 Thread Todd Nist
@Pawan Not sure if you have seen this or not, but here is a good example by Jonathan Lacefield of Datastax's on hooking up sparksql with DSE, adding Tableau is as simple as Mohammed stated with DSE. https://github.com/jlacefie/sparksqltest. HTH, Todd On Fri, Apr 3, 2015 at 2:39 PM, Todd Nist

Re: Calculating the averages for each KEY in a Pairwise (K,V) RDD ...

2015-04-28 Thread Todd Nist
Can you simply apply the https://spark.apache.org/docs/1.3.1/api/scala/index.html#org.apache.spark.util.StatCounter to this? You should be able to do something like this: val stats = RDD.map(x = x._2).stats() -Todd On Tue, Apr 28, 2015 at 10:00 AM, subscripti...@prismalytics.io

Spark Streaming Kafka Avro NPE on deserialization of payload

2015-04-30 Thread Todd Nist
I’m very perplexed with the following. I have a set of AVRO generated objects that are sent to a SparkStreaming job via Kafka. The SparkStreaming job follows the receiver-based approach. I am encountering the below error when I attempt to de serialize the payload: 15/04/30 17:49:25 INFO

Spark Streaming Kafka Avro NPE on deserialization of payload

2015-05-01 Thread Todd Nist
*Resending as I do not see that this made it to the mailing list, sorry if in fact it did an is just nor reflected online yet.* I’m very perplexed with the following. I have a set of AVRO generated objects that are sent to a SparkStreaming job via Kafka. The SparkStreaming job follows the

Re: AvroFiles

2015-05-05 Thread Todd Nist
Are you using Kryo or Java serialization? I found this post useful: http://stackoverflow.com/questions/23962796/kryo-readobject-cause-nullpointerexception-with-arraylist If using kryo, you need to register the classes with kryo, something like this: sc.registerKryoClasses(Array(

Re: Spark does not delete temporary directories

2015-05-07 Thread Todd Nist
Have you tried to set the following? spark.worker.cleanup.enabled=true spark.worker.cleanup.appDataTtl=seconds” On Thu, May 7, 2015 at 2:39 AM, Taeyun Kim taeyun@innowireless.com wrote: Hi, After a spark program completes, there are 3 temporary directories remain in the temp

Parquet Partition Strategy - how to partition data correctly

2015-05-05 Thread Todd Nist
Hi, I have a DataFrame that represents my data looks like this: +-++ | col_name| data_type | +-++ | obj_id | string | | type| string | | name

Re: value toDF is not a member of RDD object

2015-05-13 Thread Todd Nist
I believe what Dean Wampler was suggesting is to use the sqlContext not the sparkContext (sc), which is where the createDataFrame function resides: https://spark.apache.org/docs/1.3.1/api/scala/index.html#org.apache.spark.sql.SQLContext HTH. -Todd On Wed, May 13, 2015 at 6:00 AM, SLiZn Liu

Re: Cannot saveAsParquetFile from a RDD of case class

2015-04-14 Thread Todd Nist
I think docs are correct. If you follow the example from the docs and add this import shown below, I believe you will get what your looking for: // This is used to implicitly convert an RDD to a DataFrame.import sqlContext.implicits._ You could also simply take your rdd and do the following:

Spark SQL Parquet as External table - 1.3.x HiveMetastoreType now hidden

2015-04-06 Thread Todd Nist
In 1.2.1 of I was persisting a set of parquet files as a table for use by spark-sql cli later on. There was a post here http://apache-spark-user-list.1001560.n3.nabble.com/persist-table-schema-in-spark-sql-tt16297.html#a16311 by Mchael Armbrust that provide a nice little helper method for dealing

Re: Tableau + Spark SQL Thrift Server + Cassandra

2015-04-03 Thread Todd Nist
are in the remote node. I am not sure if i need to install spark and its dependencies in the webui (zepplene) node. I am not sure talking about zepplelin in this thread is right. Thanks once again for all the help. Thanks, Pawan Venugopal On Fri, Apr 3, 2015 at 11:48 AM, Todd Nist tsind

Re: Tableau + Spark SQL Thrift Server + Cassandra

2015-04-03 Thread Todd Nist
CalliopeServer2, which works like a charm with BI tools that use JDBC, but unfortunately Tableau throws an error when it connects to it. Mohammed *From:* Todd Nist [mailto:tsind...@gmail.com] *Sent:* Friday, April 3, 2015 11:39 AM *To:* pawan kumar *Cc:* Mohammed Guller; user@spark.apache.org

Re: Advice using Spark SQL and Thrift JDBC Server

2015-04-08 Thread Todd Nist
To use the HiveThriftServer2.startWithContext, I thought one would use the following artifact in the build: org.apache.spark%% spark-hive-thriftserver % 1.3.0 But I am unable to resolve the artifact. I do not see it in maven central or any other repo. Do I need to build Spark and

Re: Spark Sql - Missing Jar ? json_tuple NoClassDefFoundError

2015-04-02 Thread Todd Nist
. If you want the specific jar, you could look fr jackson or json serde in it. Thanks Best Regards On Thu, Apr 2, 2015 at 12:49 AM, Todd Nist tsind...@gmail.com wrote: I have a feeling I’m missing a Jar that provides the support or could this may be related to https://issues.apache.org/jira

Spark SQL 1.3.0 - spark-shell error : HiveMetastoreCatalog.class refers to term cache in package com.google.common which is not available

2015-04-02 Thread Todd Nist
I was trying a simple test from the spark-shell to see if 1.3.0 would address a problem I was having with locating the json_tuple class and got the following error: scala import org.apache.spark.sql.hive._ import org.apache.spark.sql.hive._ scala val sqlContext = new HiveContext(sc)sqlContext:

Re: Advice using Spark SQL and Thrift JDBC Server

2015-04-09 Thread Todd Nist
down where the dependency was coming from. Based on Patrick comments it sound like this is now resolved. Sorry for the confustion. -Todd On Wed, Apr 8, 2015 at 4:38 PM, Todd Nist tsind...@gmail.com wrote: Hi Mohammed, I think you just need to add -DskipTests to you build. Here is how I built

Re: Advice using Spark SQL and Thrift JDBC Server

2015-04-08 Thread Todd Nist
org.apache.spark#spark-network-shuffle_2.10;1.3.0 test [error] Total time: 106 s, completed Apr 8, 2015 12:33:45 PM Mohammed *From:* Michael Armbrust [mailto:mich...@databricks.com] *Sent:* Wednesday, April 8, 2015 11:54 AM *To:* Mohammed Guller *Cc:* Todd Nist; James Aley; user; Patrick

Re: Spark sql error while writing Parquet file- Trying to write more fields than contained in row

2015-05-19 Thread Todd Nist
I believe your looking for df.na.fill in scala, in pySpark Module it is fillna (http://spark.apache.org/docs/latest/api/python/pyspark.sql.html) from the docs: df4.fillna({'age': 50, 'name': 'unknown'}).show()age height name10 80 Alice5 null Bob50 null Tom50 null unknown On

Re: group by and distinct performance issue

2015-05-19 Thread Todd Nist
You may want to look at this tooling for helping identify performance issues and bottlenecks: https://github.com/kayousterhout/trace-analysis I believe this is slated to become part of the web ui in the 1.4 release, in fact based on the status of the JIRA,

Re: Spark SQL and Streaming Results

2015-06-05 Thread Todd Nist
There use to be a project, StreamSQL ( https://github.com/thunderain-project/StreamSQL), but it appears a bit dated and I do not see it in the Spark repo, but may have missed it. @TD Is this project still active? I'm not sure what the status is but it may provide some insights on how to achieve

Re: Spark 1.4 on HortonWork HDP 2.2

2015-06-19 Thread Todd Nist
You can get HDP with at least 1.3.1 from Horton: http://hortonworks.com/hadoop-tutorial/using-apache-spark-technical-preview-with-hdp-2-2/ for your convenience from the dos: wget -nv http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.2.4.4/hdp.repo -O /etc/yum.repos.d/HDP-TP.repo

Re: How to pass arguments dynamically, that needs to be used in executors

2015-06-11 Thread Todd Nist
Hi Gaurav, Seems like you could use a broadcast variable for this if I understand your use case. Create it in the driver based on the CommandLineArguments and then use it in the workers. https://spark.apache.org/docs/latest/programming-guide.html#broadcast-variables So something like:

Re: Spark DataFrame Reduce Job Took 40s for 6000 Rows

2015-06-15 Thread Todd Nist
Hi Proust, Is it possible to see the query you are running and can you run EXPLAIN EXTENDED to show the physical plan for the query. To generate the plan you can do something like this from $SPARK_HOME/bin/beeline: 0: jdbc:hive2://localhost:10001 explain extended select * from YourTableHere;

Re: Spark 1.4 release date

2015-06-12 Thread Todd Nist
It was released yesterday. On Friday, June 12, 2015, ayan guha guha.a...@gmail.com wrote: Hi When is official spark 1.4 release date? Best Ayan

Re: Setting JVM heap start and max sizes, -Xms and -Xmx, for executors

2015-07-02 Thread Todd Nist
to be a limitation at this time. -Todd On Thu, Jul 2, 2015 at 4:13 PM, Mulugeta Mammo mulugeta.abe...@gmail.com wrote: thanks but my use case requires I specify different start and max heap sizes. Looks like spark sets start and max sizes same value. On Thu, Jul 2, 2015 at 1:08 PM, Todd Nist tsind

Re: Setting JVM heap start and max sizes, -Xms and -Xmx, for executors

2015-07-02 Thread Todd Nist
You should use: spark.executor.memory from the docs https://spark.apache.org/docs/latest/configuration.html: spark.executor.memory512mAmount of memory to use per executor process, in the same format as JVM memory strings (e.g.512m, 2g). -Todd On Thu, Jul 2, 2015 at 3:36 PM, Mulugeta Mammo

spark.executor.extraClassPath - Values not picked up by executors

2015-05-22 Thread Todd Nist
I'm using the spark-cassandra-connector from DataStax in a spark streaming job launched from my own driver. It is connecting a a standalone cluster on my local box which has two worker running. This is Spark 1.3.1 and spark-cassandra-connector-1.3.0-SNAPSHOT. I have added the following entry to

Re: spark.executor.extraClassPath - Values not picked up by executors

2015-05-23 Thread Todd Nist
://datastax-oss.atlassian.net/browse/SPARKC-98 is still open... On Fri, May 22, 2015 at 6:15 PM, Todd Nist tsind...@gmail.com wrote: I'm using the spark-cassandra-connector from DataStax in a spark streaming job launched from my own driver. It is connecting a a standalone cluster on my local box which

Re: Question about Serialization in Storage Level

2015-05-21 Thread Todd Nist
From the docs, https://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence: Storage LevelMeaningMEMORY_ONLYStore RDD as deserialized Java objects in the JVM. If the RDD does not fit in memory, some partitions will not be cached and will be recomputed on the fly each time they're

Re: Starting Spark SQL thrift server from within a streaming app

2015-08-06 Thread Todd Nist
on a streaming app ? Thanks again. Daniel On Thu, Aug 6, 2015 at 1:53 AM, Todd Nist tsind...@gmail.com wrote: Hi Danniel, It is possible to create an instance of the SparkSQL Thrift server, however seems like this project is what you may be looking for: https://github.com/Intel-bigdata/spark

Re: How can I know currently supported functions in Spark SQL

2015-08-06 Thread Todd Nist
They are covered here in the docs: http://spark.apache.org/docs/1.4.1/api/scala/index.html#org.apache.spark.sql.functions$ On Thu, Aug 6, 2015 at 5:52 AM, Netwaver wanglong_...@163.com wrote: Hi All, I am using Spark 1.4.1, and I want to know how can I find the complete function

Re: Starting Spark SQL thrift server from within a streaming app

2015-08-05 Thread Todd Nist
Hi Danniel, It is possible to create an instance of the SparkSQL Thrift server, however seems like this project is what you may be looking for: https://github.com/Intel-bigdata/spark-streamingsql Not 100% sure of your use case is, but you can always convert the data into DF then issue a query

Re: Use rank with distribute by in HiveContext

2015-07-16 Thread Todd Nist
Did you take a look at the excellent write up by Yin Huai and Michael Armbrust? It appears that rank is supported in the 1.4.x release. https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html Snippet from above article for your convenience: To answer the first

Re: spark streaming job to hbase write

2015-07-15 Thread Todd Nist
There are there connector packages listed on spark packages web site: http://spark-packages.org/?q=hbase HTH. -Todd On Wed, Jul 15, 2015 at 2:46 PM, Shushant Arora shushantaror...@gmail.com wrote: Hi I have a requirement of writing in hbase table from Spark streaming app after some

Re: Does Spark streaming support is there with RabbitMQ

2015-07-20 Thread Todd Nist
There is one package available on the spark-packages site, http://spark-packages.org/package/Stratio/RabbitMQ-Receiver The source is here: https://github.com/Stratio/RabbitMQ-Receiver Not sure that meets your needs or not. -Todd On Mon, Jul 20, 2015 at 8:52 AM, Jeetendra Gangele

Re: java.lang.NegativeArraySizeException? as iterating a big RDD

2015-10-23 Thread Todd Nist
Hi Yifan, You could also try increasing the spark.kryoserializer.buffer.max.mb *spark.kryoserializer.buffer.max.mb *(64 Mb by default) : useful if your default buffer size goes further than 64 Mb; Per doc: Maximum allowable size of Kryo serialization buffer. This must be larger than any object

Re: Newbie Help for spark compilation problem

2015-10-25 Thread Todd Nist
2.11 artifacts are in fact published: > http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22spark-parent_2.11%22 > > On Sun, Oct 25, 2015 at 7:37 PM, Todd Nist <tsind...@gmail.com> wrote: > > Sorry Sean you are absolutely right it supports 2.11 all o meant is > there is > >

Re: Maven build failed (Spark master)

2015-10-27 Thread Todd Nist
I issued the same basic command and it worked fine. RADTech-MBP:spark $ ./make-distribution.sh --name hadoop-2.6 --tgz -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver -DskipTests Which created: spark-1.6.0-SNAPSHOT-bin-hadoop-2.6.tgz in the root directory of the project.

Re: Newbie Help for spark compilation problem

2015-10-25 Thread Todd Nist
Hi Bilnmek, Spark 1.5.x does not support Scala 2.11.7 so the easiest thing to do it build it like your trying. Here are the steps I followed to build it on a Max OS X 10.10.5 environment, should be very similar on ubuntu. 1. set theJAVA_HOME environment variable in my bash session via export

Re: Newbie Help for spark compilation problem

2015-10-25 Thread Todd Nist
t support 2.11? It does. > > It is not even this difficult; you just need a source distribution, > and then run "./dev/change-scala-version.sh 2.11" as you say. Then > build as normal > > On Sun, Oct 25, 2015 at 4:00 PM, Todd Nist <tsind...@gmail.com > <javascrip

Re: Spark SQL Thriftserver and Hive UDF in Production

2015-10-19 Thread Todd Nist
>From tableau, you should be able to use the Initial SQL option to support this: So in Tableau add the following to the “Initial SQL” create function myfunc AS 'myclass' using jar 'hdfs:///path/to/jar'; HTH, Todd On Mon, Oct 19, 2015 at 11:22 AM, Deenar Toraskar

Re: Saving RDD into cassandra keyspace.

2015-07-10 Thread Todd Nist
I would strongly encourage you to read the docs at, they are very useful in getting up and running: https://github.com/datastax/spark-cassandra-connector/blob/master/doc/0_quick_start.md For your use case shown above, you will need to ensure that you include the appropriate version of the

Re: [X-post] Saving SparkSQL result RDD to Cassandra

2015-07-09 Thread Todd Nist
foreachRDD returns a unit: def foreachRDD(foreachFunc: (RDD https://spark.apache.org/docs/latest/api/scala/org/apache/spark/rdd/RDD.html [T]) ⇒ Unit): Unit Apply a function to each RDD in this DStream. This is an output operator, so 'this' DStream will be registered as an output stream and

Re: Tungsten and Spark Streaming

2015-09-10 Thread Todd Nist
https://issues.apache.org/jira/browse/SPARK-8360?jql=project%20%3D%20SPARK%20AND%20text%20~%20Streaming -Todd On Thu, Sep 10, 2015 at 10:22 AM, Gurvinder Singh < gurvinder.si...@uninett.no> wrote: > On 09/10/2015 07:42 AM, Tathagata Das wrote: > > Rewriting is necessary. You will have to

Re: Replacing Esper with Spark Streaming?

2015-09-14 Thread Todd Nist
Stratio offers a CEP implementation based on Spark Streaming and the Siddhi CEP engine. I have not used the below, but they may be of some value to you: http://stratio.github.io/streaming-cep-engine/ https://github.com/Stratio/streaming-cep-engine HTH. -Todd On Sun, Sep 13, 2015 at 7:49 PM,

Re: KafkaProducer using Cassandra as source

2015-09-23 Thread Todd Nist
Hi Kali, If you do not mind sending JSON, you could do something like this, using json4s: val rows = p.collect() map ( row => TestTable(row.getString(0), row.getString(1)) ) val json = parse(write(rows)) producer.send(new KeyedMessage[String, String]("trade", writePretty(json))) // or for

Re: Securing objects on the thrift server

2015-12-15 Thread Todd Nist
see https://issues.apache.org/jira/browse/SPARK-11043, it is resolved in 1.6. On Tue, Dec 15, 2015 at 2:28 PM, Younes Naguib < younes.nag...@tritondigital.com> wrote: > The one coming with spark 1.5.2. > > > > y > > > > *From:* Ted Yu [mailto:yuzhih...@gmail.com] > *Sent:* December-15-15 1:59 PM

Re: looking for a easier way to count the number of items in a JavaDStream

2015-12-16 Thread Todd Nist
Another possible alternative is to register a StreamingListener and then reference the BatchInfo.numRecords; good example here, https://gist.github.com/akhld/b10dc491aad1a2007183. After registering the listener, Simply implement the appropriate "onEvent" method where onEvent is onBatchStarted,

Re: write new data to mysql

2016-01-08 Thread Todd Nist
Sorry, did not see your update until now. On Fri, Jan 8, 2016 at 3:52 PM, Todd Nist <tsind...@gmail.com> wrote: > Hi Yasemin, > > What version of Spark are you using? Here is the reference, it is off of > the DataFrame > https://spark.apache.org/docs/lates

Re: write new data to mysql

2016-01-08 Thread Todd Nist
that Todd mentioned or i cant find it. > The code and error are in gist > <https://gist.github.com/yaseminn/f5a2b78b126df71dfd0b>. Could you check > it out please? > > Best, > yasemin > > 2016-01-08 18:23 GMT+02:00 Todd Nist <tsind...@gmail.com>: > >> It

Re: problem building spark on centos

2016-01-06 Thread Todd Nist
That should read "I think your missing the --name option". Sorry about that. On Wed, Jan 6, 2016 at 3:03 PM, Todd Nist <tsind...@gmail.com> wrote: > Hi Jade, > > I think you "--name" option. The makedistribution should look like this: > > ./make-distr

Re: problem building spark on centos

2016-01-06 Thread Todd Nist
Hi Jade, I think you "--name" option. The makedistribution should look like this: ./make-distribution.sh --name hadoop-2.6 --tgz -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver -DskipTests. As for why it failed to build with scala 2.11, did you run the

Re: problem building spark on centos

2016-01-06 Thread Todd Nist
i.apache.org/confluence/display/MAVEN/PluginExecutionException > [ERROR] > [ERROR] After correcting the problems, you can resume the build with the > command > [ERROR] mvn -rf :spark-launcher_2.10 > > Do you think it’s java problem? I’m using oracle JDK 1.7. Should I update > it to

Re: Getting the batch time of the active batches in spark streaming

2015-11-24 Thread Todd Nist
Hi Abhi, You should be able to register a org.apache.spark.streaming.scheduler.StreamListener. There is an example here that may help: https://gist.github.com/akhld/b10dc491aad1a2007183 and the spark api docs here,

Re: Getting the batch time of the active batches in spark streaming

2015-11-24 Thread Todd Nist
(StreamingListenerBatchSubmitted batchSubmitted) { system.out.println("Start time: " + batchSubmitted.batchInfo.processingStartTime) } Sorry for the confusion. -Todd On Tue, Nov 24, 2015 at 7:51 PM, Todd Nist <tsind...@gmail.com> wrote: > Hi Abhi, > > You s

  1   2   >