Hi Ashu,
Per the documents:
Configuration of Hive is done by placing your hive-site.xml file in conf/.
For example, you can place a something like this in your
$SPARK_HOME/conf/hive-site.xml file:
configuration
property
namehive.metastore.uris/name
*!-- Ensure that the following statement
;
NULLMichael
30 Andy
19 Justin
NULLMichael
30 Andy
19 Justin
Time taken: 0.576 seconds
From: Todd Nist
Date: Tuesday, February 10, 2015 at 6:49 PM
To: Silvio Fiorito
Cc: user@spark.apache.org
Subject: Re: SparkSQL + Tableau Connector
Hi Silvio,
Ah, I like
11, 2015 at 3:59 PM, Todd Nist tsind...@gmail.com wrote:
Hi Arush,
So yes I want to create the tables through Spark SQL. I have placed the
hive-site.xml file inside of the $SPARK_HOME/conf directory I thought that
was all I should need to do to have the thriftserver use it. Perhaps my
hive
.html
On Thu, Feb 12, 2015 at 7:24 AM, Todd Nist tsind...@gmail.com wrote:
I have a question with regards to accessing SchemaRDD’s and Spark SQL
temp tables via the thrift server. It appears that a SchemaRDD when
created is only available in the local namespace / context and are
unavailable
Hi Dhimant,
I believe if you change your spark-shell to pass -driver-class-path
/usr/local/spark/lib/mysql-connector-java-5.1.34-bin.jar vs putting it in
--jars.
-Todd
On Wed, Feb 18, 2015 at 10:41 PM, Dhimant dhimant84.jays...@gmail.com
wrote:
Found solution from one of the post found on
I am able to connect by doing the following using the Tableau Initial SQL
and a custom query:
1.
First ingest csv file or json and save out to file system:
import org.apache.spark.sql.SQLContext
import com.databricks.spark.csv._
val sqlContext = new SQLContext(sc)
val demo =
Hi Emre,
Have you tried adjusting these:
.set(spark.akka.frameSize, 500).set(spark.akka.askTimeout,
30).set(spark.core.connection.ack.wait.timeout, 600)
-Todd
On Fri, Feb 20, 2015 at 8:14 AM, Emre Sevinc emre.sev...@gmail.com wrote:
Hello,
We are building a Spark Streaming application that
in the schema. In that case you will either have to generate the Hive
tables externally from Spark or use Spark to process the data and save them
using a HiveContext.
From: Todd Nist
Date: Wednesday, February 11, 2015 at 7:53 PM
To: Andrew Lee
Cc: Arush Kharbanda, user@spark.apache.org
*@Sasi*
You should be able to create a job something like this:
package io.radtech.spark.jobserver
import java.util.UUID
import org.apache.spark.{ SparkConf, SparkContext }
import org.apache.spark.rdd.RDD
import org.joda.time.DateTime
import com.datastax.spark.connector.types.TypeConverter
using --files hive-site.xml.
similarly you can specify the same metastore to your spark-submit or
sharp-shell using the same option.
On Wed, Feb 11, 2015 at 5:23 AM, Todd Nist tsind...@gmail.com wrote:
Arush,
As for #2 do you mean something like this from the docs:
// sc is an existing
Hi,
I'm trying to understand how and what the Tableau connector to SparkSQL is
able to access. My understanding is it needs to connect to the
thriftserver and I am not sure how or if it exposes parquet, json,
schemaRDDs, or does it only expose schemas defined in the metastore / hive.
For
I have a question with regards to accessing SchemaRDD’s and Spark SQL temp
tables via the thrift server. It appears that a SchemaRDD when created is
only available in the local namespace / context and are unavailable to
external services accessing Spark through thrift server via ODBC; is this
What does your hive-site.xml look like? Do you actually have a directory
at the location shown in the error? i.e does /user/hive/warehouse/src
exist? You should be able to override this by specifying the following:
--hiveconf
hive.metastore.warehouse.dir=/location/where/your/warehouse/exists
/resources/kv1.txt' INTO TABLE src)
// Queries are expressed in HiveQLsqlContext.sql(FROM src SELECT key,
value).collect().foreach(println)
Or did you have something else in mind?
-Todd
On Tue, Feb 10, 2015 at 6:35 PM, Todd Nist tsind...@gmail.com wrote:
Arush,
Thank you will take a look
fashion, sort of related to question 2 you would need to configure thrift
to read from the metastore you expect it read from - by default it reads
from metastore_db directory present in the directory used to launch the
thrift server.
On 11 Feb 2015 01:35, Todd Nist tsind...@gmail.com wrote:
Hi
users using org.apache.spark.sql.parquet options
(path 'examples/src/main/resources/users.parquet’)
cache table users
From: Todd Nist
Date: Tuesday, February 10, 2015 at 3:03 PM
To: user@spark.apache.org
Subject: SparkSQL + Tableau Connector
Hi,
I'm trying to understand how and what
Hi Bharath,
I ran into the same issue a few days ago, here is a link to a post on
Horton's fourm. http://hortonworks.com/community/forums/search/spark+1.2.1/
Incase anyone else needs to perform this these are the steps I took to get
it to work with Spark 1.2.1 as well as Spark 1.3.0-RC3:
1.
in the yarn cluster? I'd assume that the latter shouldn't be
necessary.
On Mon, Mar 16, 2015 at 8:38 PM, Todd Nist tsind...@gmail.com wrote:
Hi Bharath,
I ran into the same issue a few days ago, here is a link to a post on
Horton's fourm.
http://hortonworks.com/community/forums/search
problem first?
*From:* Todd Nist [mailto:tsind...@gmail.com]
*Sent:* Thursday, March 19, 2015 7:49 AM
*To:* user@spark.apache.org
*Subject:* [SQL] Elasticsearch-hadoop, exception creating temporary table
I am attempting to access ElasticSearch and expose it’s data through
SparkSQL using
I am attempting to access ElasticSearch and expose it’s data through
SparkSQL using the elasticsearch-hadoop project. I am encountering the
following exception when trying to create a Temporary table from a resource
in ElasticSearch.:
15/03/18 07:54:46 INFO DAGScheduler: Job 2 finished: runJob
I am attempting to access ElasticSearch and expose it’s data through
SparkSQL using the elasticsearch-hadoop project. I am encountering the
following exception when trying to create a Temporary table from a resource
in ElasticSearch.:
15/03/18 07:54:46 INFO DAGScheduler: Job 2 finished: runJob
is also based on scala, I was looking for some help with
java Apis.
*Thanks,*
*Udbhav Agarwal*
*From:* Todd Nist [mailto:tsind...@gmail.com]
*Sent:* 12 March, 2015 5:28 PM
*To:* Udbhav Agarwal
*Cc:* Akhil Das; user@spark.apache.org
*Subject:* Re: hbase sql query
Have you considered
On Thu, Mar 5, 2015 at 10:04 AM, Todd Nist tsind...@gmail.com wrote:
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:166)
at
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163
failed in the first place.
Thanks.
Zhan Zhang
On Mar 6, 2015, at 9:59 AM, Todd Nist tsind...@gmail.com wrote:
First, thanks to everyone for their assistance and recommendations.
@Marcelo
I applied the patch that you recommended and am now able to get into the
shell, thank you worked
, at 11:40 AM, Zhan Zhang zzh...@hortonworks.com wrote:
You are using 1.2.1 right? If so, please add java-opts in conf
directory and give it a try.
[root@c6401 conf]# more java-opts
-Dhdp.version=2.2.2.0-2041
Thanks.
Zhan Zhang
On Mar 6, 2015, at 11:35 AM, Todd Nist tsind
There is the PR https://github.com/apache/spark/pull/2077 for doing this.
On Fri, Mar 13, 2015 at 6:42 AM, t1ny wbr...@gmail.com wrote:
Hi all,
We are looking for a tool that would let us visualize the DAG generated by
a
Spark application as a simple graph.
This graph would represent the
Have you considered using the spark-hbase-connector for this:
https://github.com/nerdammer/spark-hbase-connector
On Thu, Mar 12, 2015 at 5:19 AM, Udbhav Agarwal udbhav.agar...@syncoms.com
wrote:
Thanks Akhil.
Additionaly if we want to do sql query we need to create JavaPairRdd, then
Perhaps this project, https://github.com/calrissian/spark-jetty-server,
could help with your requirements.
On Tue, Mar 24, 2015 at 7:12 AM, Jeffrey Jedele jeffrey.jed...@gmail.com
wrote:
I don't think there's are general approach to that - the usecases are just
to different. If you really need
I am accessing ElasticSearch via the elasticsearch-hadoop and attempting to
expose it via SparkSQL. I am using spark 1.2.1, latest supported by
elasticsearch-hadoop, and org.elasticsearch % elasticsearch-hadoop %
2.1.0.BUILD-SNAPSHOT of elasticsearch-hadoop. I’m
encountering an issue when I
Here are a few ways to achieve what your loolking to do:
https://github.com/cjnolet/spark-jetty-server
Spark Job Server - https://github.com/spark-jobserver/spark-jobserver -
defines a REST API for Spark
Hue -
at 3:26 PM, Todd Nist tsind...@gmail.com wrote:
I am accessing ElasticSearch via the elasticsearch-hadoop and attempting
to expose it via SparkSQL. I am using spark 1.2.1, latest supported by
elasticsearch-hadoop, and org.elasticsearch % elasticsearch-hadoop %
2.1.0.BUILD-SNAPSHOT of elasticsearch
You can specify these jars (joda-time-2.7.jar, joda-convert-1.7.jar) either
as part of your build and assembly or via the --jars option to spark-submit.
HTH.
On Fri, Feb 27, 2015 at 2:48 PM, Su She suhsheka...@gmail.com wrote:
Hello Everyone,
I'm having some issues launching (non-spark)
Hi Srini,
If you start the $SPARK_HOME/sbin/start-history-server, you should be able
to see the basic spark ui. You will not see the master, but you will be
able to see the rest as I recall. You also need to add an entry into the
spark-defaults.conf, something like this:
*## Make sure the host
I am running Spark on a HortonWorks HDP Cluster. I have deployed there
prebuilt version but it is only for Spark 1.2.0 not 1.2.1 and there are a
few fixes and features in there that I would like to leverage.
I just downloaded the spark-1.2.1 source and built it to support Hadoop 2.6
by doing the
:
-Djackson.version=1.9.3
Cheers
On Thu, Mar 5, 2015 at 10:04 AM, Todd Nist tsind...@gmail.com wrote:
I am running Spark on a HortonWorks HDP Cluster. I have deployed there
prebuilt version but it is only for Spark 1.2.0 not 1.2.1 and there are
a
few fixes and features in there that I would like
Hi Kannan,
I believe you should be able to use the --jars for this when invoke the
spark-shell or perform a spark-submit. Per docs:
--jars JARSComma-separated list of local jars to include on the
driver
and executor classpaths.
HTH.
-Todd
On Thu, Feb
Hi Kannan,
Issues with using --jars make sense. I believe you can set the classpath
via the use the --conf spark.executor.extraClassPath= or in your driver
with .set(spark.executor.extraClassPath, .)
I believe you are correct with the localize as well as long as your
guaranteed that all
a deployment of the
spark distribution or any other config change to support a spark job.
Isn't that correct?
On Tue, Mar 17, 2015 at 6:19 PM, Todd Nist tsind...@gmail.com wrote:
Hi Bharath,
Do you have these entries in your $SPARK_HOME/conf/spark-defaults.conf
file?
spark.driver.extraJavaOptions
:
Seems the elasticsearch-hadoop project was built with an old version of
Spark, and then you upgraded the Spark version in execution env, as I know
the StructField changed the definition in Spark 1.2, can you confirm the
version problem first?
*From:* Todd Nist [mailto:tsind...@gmail.com]
*Sent
Hi Young,
Sorry for the duplicate post, want to reply to all.
I just downloaded the bits prebuilt form apache spark download site.
Started the spark shell and got the same error.
I then started the shell as follows:
./bin/spark-shell --master spark://radtech.io:7077 --total-executor-cores 2
is download location ?
On Fri, Apr 3, 2015 at 3:42 PM, Todd Nist tsind...@gmail.com wrote:
Started the spark shell with the one jar from hive suggested:
./bin/spark-shell --master spark://radtech.io:7077 --total-executor-cores 2
--driver-class-path /usr/local/spark/lib/mysql-connector-java
definition (code) of UDF json_tuple. That should solve
your problem.
On Fri, Apr 3, 2015 at 3:57 PM, Todd Nist tsind...@gmail.com wrote:
I placed it there. It was downloaded from MySql site.
On Fri, Apr 3, 2015 at 6:25 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com
wrote:
Akhil
you mentioned /usr/local
Thanks
Best Regards
On Fri, Apr 3, 2015 at 2:55 PM, Todd Nist tsind...@gmail.com wrote:
Hi Akhil,
This is for version 1.2.1. Well the other thread that you reference was
me attempting it in 1.3.0 to see if the issue was related to 1.2.1. I did
not build Spark but used the version from
What version of Cassandra are you using? Are you using DSE or the stock
Apache Cassandra version? I have connected it with DSE, but have not
attempted it with the standard Apache Cassandra version.
FWIW,
in Tableau using the ODBC driver that comes with DSE. Once
you connect, Tableau allows to use C* keyspace as schema and column
families as tables.
Mohammed
*From:* pawan kumar [mailto:pkv...@gmail.com]
*Sent:* Friday, April 3, 2015 7:41 AM
*To:* Todd Nist
*Cc:* user@spark.apache.org; Mohammed
@Pawan
Not sure if you have seen this or not, but here is a good example by
Jonathan Lacefield of Datastax's on hooking up sparksql with DSE, adding
Tableau is as simple as Mohammed stated with DSE.
https://github.com/jlacefie/sparksqltest.
HTH,
Todd
On Fri, Apr 3, 2015 at 2:39 PM, Todd Nist
Can you simply apply the
https://spark.apache.org/docs/1.3.1/api/scala/index.html#org.apache.spark.util.StatCounter
to this? You should be able to do something like this:
val stats = RDD.map(x = x._2).stats()
-Todd
On Tue, Apr 28, 2015 at 10:00 AM, subscripti...@prismalytics.io
I’m very perplexed with the following. I have a set of AVRO generated
objects that are sent to a SparkStreaming job via Kafka. The SparkStreaming
job follows the receiver-based approach. I am encountering the below error
when I attempt to de serialize the payload:
15/04/30 17:49:25 INFO
*Resending as I do not see that this made it to the mailing list, sorry if
in fact it did an is just nor reflected online yet.*
I’m very perplexed with the following. I have a set of AVRO generated
objects that are sent to a SparkStreaming job via Kafka. The SparkStreaming
job follows the
Are you using Kryo or Java serialization? I found this post useful:
http://stackoverflow.com/questions/23962796/kryo-readobject-cause-nullpointerexception-with-arraylist
If using kryo, you need to register the classes with kryo, something like
this:
sc.registerKryoClasses(Array(
Have you tried to set the following?
spark.worker.cleanup.enabled=true
spark.worker.cleanup.appDataTtl=seconds”
On Thu, May 7, 2015 at 2:39 AM, Taeyun Kim taeyun@innowireless.com
wrote:
Hi,
After a spark program completes, there are 3 temporary directories remain
in the temp
Hi,
I have a DataFrame that represents my data looks like this:
+-++
| col_name| data_type |
+-++
| obj_id | string |
| type| string |
| name
I believe what Dean Wampler was suggesting is to use the sqlContext not the
sparkContext (sc), which is where the createDataFrame function resides:
https://spark.apache.org/docs/1.3.1/api/scala/index.html#org.apache.spark.sql.SQLContext
HTH.
-Todd
On Wed, May 13, 2015 at 6:00 AM, SLiZn Liu
I think docs are correct. If you follow the example from the docs and add
this import shown below, I believe you will get what your looking for:
// This is used to implicitly convert an RDD to a DataFrame.import
sqlContext.implicits._
You could also simply take your rdd and do the following:
In 1.2.1 of I was persisting a set of parquet files as a table for use by
spark-sql cli later on. There was a post here
http://apache-spark-user-list.1001560.n3.nabble.com/persist-table-schema-in-spark-sql-tt16297.html#a16311
by
Mchael Armbrust that provide a nice little helper method for dealing
are in the remote node. I
am not sure if i need to install spark and its dependencies in the webui
(zepplene) node.
I am not sure talking about zepplelin in this thread is right.
Thanks once again for all the help.
Thanks,
Pawan Venugopal
On Fri, Apr 3, 2015 at 11:48 AM, Todd Nist tsind
CalliopeServer2, which works like a charm with BI tools that
use JDBC, but unfortunately Tableau throws an error when it connects to it.
Mohammed
*From:* Todd Nist [mailto:tsind...@gmail.com]
*Sent:* Friday, April 3, 2015 11:39 AM
*To:* pawan kumar
*Cc:* Mohammed Guller; user@spark.apache.org
To use the HiveThriftServer2.startWithContext, I thought one would use the
following artifact in the build:
org.apache.spark%% spark-hive-thriftserver % 1.3.0
But I am unable to resolve the artifact. I do not see it in maven central
or any other repo. Do I need to build Spark and
. If you want the
specific jar, you could look fr jackson or json serde in it.
Thanks
Best Regards
On Thu, Apr 2, 2015 at 12:49 AM, Todd Nist tsind...@gmail.com wrote:
I have a feeling I’m missing a Jar that provides the support or could
this may be related to https://issues.apache.org/jira
I was trying a simple test from the spark-shell to see if 1.3.0 would
address a problem I was having with locating the json_tuple class and got
the following error:
scala import org.apache.spark.sql.hive._
import org.apache.spark.sql.hive._
scala val sqlContext = new HiveContext(sc)sqlContext:
down where
the dependency was coming from. Based on Patrick comments it sound like
this is now resolved.
Sorry for the confustion.
-Todd
On Wed, Apr 8, 2015 at 4:38 PM, Todd Nist tsind...@gmail.com wrote:
Hi Mohammed,
I think you just need to add -DskipTests to you build. Here is how I
built
org.apache.spark#spark-network-shuffle_2.10;1.3.0 test
[error] Total time: 106 s, completed Apr 8, 2015 12:33:45 PM
Mohammed
*From:* Michael Armbrust [mailto:mich...@databricks.com]
*Sent:* Wednesday, April 8, 2015 11:54 AM
*To:* Mohammed Guller
*Cc:* Todd Nist; James Aley; user; Patrick
I believe your looking for df.na.fill in scala, in pySpark Module it is
fillna (http://spark.apache.org/docs/latest/api/python/pyspark.sql.html)
from the docs:
df4.fillna({'age': 50, 'name': 'unknown'}).show()age height name10 80
Alice5 null Bob50 null Tom50 null unknown
On
You may want to look at this tooling for helping identify performance
issues and bottlenecks:
https://github.com/kayousterhout/trace-analysis
I believe this is slated to become part of the web ui in the 1.4 release,
in fact based on the status of the JIRA,
There use to be a project, StreamSQL (
https://github.com/thunderain-project/StreamSQL), but it appears a bit
dated and I do not see it in the Spark repo, but may have missed it.
@TD Is this project still active?
I'm not sure what the status is but it may provide some insights on how to
achieve
You can get HDP with at least 1.3.1 from Horton:
http://hortonworks.com/hadoop-tutorial/using-apache-spark-technical-preview-with-hdp-2-2/
for your convenience from the dos:
wget -nv
http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.2.4.4/hdp.repo
-O /etc/yum.repos.d/HDP-TP.repo
Hi Gaurav,
Seems like you could use a broadcast variable for this if I understand your
use case. Create it in the driver based on the CommandLineArguments and
then use it in the workers.
https://spark.apache.org/docs/latest/programming-guide.html#broadcast-variables
So something like:
Hi Proust,
Is it possible to see the query you are running and can you run EXPLAIN
EXTENDED to show the physical plan
for the query. To generate the plan you can do something like this from
$SPARK_HOME/bin/beeline:
0: jdbc:hive2://localhost:10001 explain extended select * from
YourTableHere;
It was released yesterday.
On Friday, June 12, 2015, ayan guha guha.a...@gmail.com wrote:
Hi
When is official spark 1.4 release date?
Best
Ayan
to be a limitation at this time.
-Todd
On Thu, Jul 2, 2015 at 4:13 PM, Mulugeta Mammo mulugeta.abe...@gmail.com
wrote:
thanks but my use case requires I specify different start and max heap
sizes. Looks like spark sets start and max sizes same value.
On Thu, Jul 2, 2015 at 1:08 PM, Todd Nist tsind
You should use:
spark.executor.memory
from the docs https://spark.apache.org/docs/latest/configuration.html:
spark.executor.memory512mAmount of memory to use per executor process, in
the same format as JVM memory strings (e.g.512m, 2g).
-Todd
On Thu, Jul 2, 2015 at 3:36 PM, Mulugeta Mammo
I'm using the spark-cassandra-connector from DataStax in a spark streaming
job launched from my own driver. It is connecting a a standalone cluster
on my local box which has two worker running.
This is Spark 1.3.1 and spark-cassandra-connector-1.3.0-SNAPSHOT. I have
added the following entry to
://datastax-oss.atlassian.net/browse/SPARKC-98 is still open...
On Fri, May 22, 2015 at 6:15 PM, Todd Nist tsind...@gmail.com wrote:
I'm using the spark-cassandra-connector from DataStax in a spark
streaming job launched from my own driver. It is connecting a a standalone
cluster on my local box which
From the docs,
https://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence:
Storage LevelMeaningMEMORY_ONLYStore RDD as deserialized Java objects in
the JVM. If the RDD does not fit in memory, some partitions will not be
cached and will be recomputed on the fly each time they're
on a streaming app ?
Thanks again.
Daniel
On Thu, Aug 6, 2015 at 1:53 AM, Todd Nist tsind...@gmail.com wrote:
Hi Danniel,
It is possible to create an instance of the SparkSQL Thrift server,
however seems like this project is what you may be looking for:
https://github.com/Intel-bigdata/spark
They are covered here in the docs:
http://spark.apache.org/docs/1.4.1/api/scala/index.html#org.apache.spark.sql.functions$
On Thu, Aug 6, 2015 at 5:52 AM, Netwaver wanglong_...@163.com wrote:
Hi All,
I am using Spark 1.4.1, and I want to know how can I find the
complete function
Hi Danniel,
It is possible to create an instance of the SparkSQL Thrift server, however
seems like this project is what you may be looking for:
https://github.com/Intel-bigdata/spark-streamingsql
Not 100% sure of your use case is, but you can always convert the data into
DF then issue a query
Did you take a look at the excellent write up by Yin Huai and Michael
Armbrust? It appears that rank is supported in the 1.4.x release.
https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html
Snippet from above article for your convenience:
To answer the first
There are there connector packages listed on spark packages web site:
http://spark-packages.org/?q=hbase
HTH.
-Todd
On Wed, Jul 15, 2015 at 2:46 PM, Shushant Arora shushantaror...@gmail.com
wrote:
Hi
I have a requirement of writing in hbase table from Spark streaming app
after some
There is one package available on the spark-packages site,
http://spark-packages.org/package/Stratio/RabbitMQ-Receiver
The source is here:
https://github.com/Stratio/RabbitMQ-Receiver
Not sure that meets your needs or not.
-Todd
On Mon, Jul 20, 2015 at 8:52 AM, Jeetendra Gangele
Hi Yifan,
You could also try increasing the spark.kryoserializer.buffer.max.mb
*spark.kryoserializer.buffer.max.mb *(64 Mb by default) : useful if your
default buffer size goes further than 64 Mb;
Per doc:
Maximum allowable size of Kryo serialization buffer. This must be larger
than any object
2.11 artifacts are in fact published:
> http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22spark-parent_2.11%22
>
> On Sun, Oct 25, 2015 at 7:37 PM, Todd Nist <tsind...@gmail.com> wrote:
> > Sorry Sean you are absolutely right it supports 2.11 all o meant is
> there is
> >
I issued the same basic command and it worked fine.
RADTech-MBP:spark $ ./make-distribution.sh --name hadoop-2.6 --tgz -Pyarn
-Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver -DskipTests
Which created: spark-1.6.0-SNAPSHOT-bin-hadoop-2.6.tgz in the root
directory of the project.
Hi Bilnmek,
Spark 1.5.x does not support Scala 2.11.7 so the easiest thing to do it
build it like your trying. Here are the steps I followed to build it on a
Max OS X 10.10.5 environment, should be very similar on ubuntu.
1. set theJAVA_HOME environment variable in my bash session via export
t support 2.11? It does.
>
> It is not even this difficult; you just need a source distribution,
> and then run "./dev/change-scala-version.sh 2.11" as you say. Then
> build as normal
>
> On Sun, Oct 25, 2015 at 4:00 PM, Todd Nist <tsind...@gmail.com
> <javascrip
>From tableau, you should be able to use the Initial SQL option to support
this:
So in Tableau add the following to the “Initial SQL”
create function myfunc AS 'myclass'
using jar 'hdfs:///path/to/jar';
HTH,
Todd
On Mon, Oct 19, 2015 at 11:22 AM, Deenar Toraskar
I would strongly encourage you to read the docs at, they are very useful in
getting up and running:
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/0_quick_start.md
For your use case shown above, you will need to ensure that you include the
appropriate version of the
foreachRDD returns a unit:
def foreachRDD(foreachFunc: (RDD
https://spark.apache.org/docs/latest/api/scala/org/apache/spark/rdd/RDD.html
[T]) ⇒ Unit): Unit
Apply a function to each RDD in this DStream. This is an output operator,
so 'this' DStream will be registered as an output stream and
https://issues.apache.org/jira/browse/SPARK-8360?jql=project%20%3D%20SPARK%20AND%20text%20~%20Streaming
-Todd
On Thu, Sep 10, 2015 at 10:22 AM, Gurvinder Singh <
gurvinder.si...@uninett.no> wrote:
> On 09/10/2015 07:42 AM, Tathagata Das wrote:
> > Rewriting is necessary. You will have to
Stratio offers a CEP implementation based on Spark Streaming and the Siddhi
CEP engine. I have not used the below, but they may be of some value to
you:
http://stratio.github.io/streaming-cep-engine/
https://github.com/Stratio/streaming-cep-engine
HTH.
-Todd
On Sun, Sep 13, 2015 at 7:49 PM,
Hi Kali,
If you do not mind sending JSON, you could do something like this, using
json4s:
val rows = p.collect() map ( row => TestTable(row.getString(0),
row.getString(1)) )
val json = parse(write(rows))
producer.send(new KeyedMessage[String, String]("trade", writePretty(json)))
// or for
see https://issues.apache.org/jira/browse/SPARK-11043, it is resolved in
1.6.
On Tue, Dec 15, 2015 at 2:28 PM, Younes Naguib <
younes.nag...@tritondigital.com> wrote:
> The one coming with spark 1.5.2.
>
>
>
> y
>
>
>
> *From:* Ted Yu [mailto:yuzhih...@gmail.com]
> *Sent:* December-15-15 1:59 PM
Another possible alternative is to register a StreamingListener and then
reference the BatchInfo.numRecords; good example here,
https://gist.github.com/akhld/b10dc491aad1a2007183.
After registering the listener, Simply implement the appropriate "onEvent"
method where onEvent is onBatchStarted,
Sorry, did not see your update until now.
On Fri, Jan 8, 2016 at 3:52 PM, Todd Nist <tsind...@gmail.com> wrote:
> Hi Yasemin,
>
> What version of Spark are you using? Here is the reference, it is off of
> the DataFrame
> https://spark.apache.org/docs/lates
that Todd mentioned or i cant find it.
> The code and error are in gist
> <https://gist.github.com/yaseminn/f5a2b78b126df71dfd0b>. Could you check
> it out please?
>
> Best,
> yasemin
>
> 2016-01-08 18:23 GMT+02:00 Todd Nist <tsind...@gmail.com>:
>
>> It
That should read "I think your missing the --name option". Sorry about
that.
On Wed, Jan 6, 2016 at 3:03 PM, Todd Nist <tsind...@gmail.com> wrote:
> Hi Jade,
>
> I think you "--name" option. The makedistribution should look like this:
>
> ./make-distr
Hi Jade,
I think you "--name" option. The makedistribution should look like this:
./make-distribution.sh --name hadoop-2.6 --tgz -Pyarn -Phadoop-2.6
-Dhadoop.version=2.6.0 -Phive -Phive-thriftserver -DskipTests.
As for why it failed to build with scala 2.11, did you run the
i.apache.org/confluence/display/MAVEN/PluginExecutionException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the
> command
> [ERROR] mvn -rf :spark-launcher_2.10
>
> Do you think it’s java problem? I’m using oracle JDK 1.7. Should I update
> it to
Hi Abhi,
You should be able to register a
org.apache.spark.streaming.scheduler.StreamListener.
There is an example here that may help:
https://gist.github.com/akhld/b10dc491aad1a2007183 and the spark api docs
here,
(StreamingListenerBatchSubmitted batchSubmitted)
{ system.out.println("Start time: " +
batchSubmitted.batchInfo.processingStartTime)
}
Sorry for the confusion.
-Todd
On Tue, Nov 24, 2015 at 7:51 PM, Todd Nist <tsind...@gmail.com> wrote:
> Hi Abhi,
>
> You s
1 - 100 of 134 matches
Mail list logo