Hi,
I have already found the way about how to “insert into HIVE_TABLE values (…..)
Regards
Arthur
On 18 Oct, 2014, at 10:09 pm, Cheng Lian lian.cs@gmail.com wrote:
Currently Spark SQL uses Hive 0.12.0, which doesn't support the INSERT INTO
... VALUES ... syntax.
On 10/18/14 1:33 AM,
Hi All
We are trying to create a table in Hive from spark-assembly-1.0.2.jar file.
CREATE TABLE IF NOT EXISTS src (key INT, value STRING)
JavaSparkContext sc = CC2SparkManager.sharedInstance().getSparkContext();
JavaHiveContext sqlContext = new JavaHiveContext(sc);
sqlContext.sql(CREATE
I tried that already, same exception. I also tried using an accumulator to
collect all filenames. The filename is not the problem.
Even this crashes with the same exception:
sc.parallelize(files.value).map { fileName =
println(sScanning $fileName)
try {
println(sScanning
Hi,
Suppose I have a stream of logs and I want to count them by minute.
The result is like:
2014-10-26 18:38:00 100
2014-10-26 18:39:00 150
2014-10-26 18:40:00 200
One way to do this is to set the batch interval to 1 min, but each
batch would be quite large.
Or I can use updateStateByKey where
Sorry, I forgot to say that this gives the above error just when run on a
cluster, not in local mode.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Bug-in-Accumulators-tp17263p17277.html
Sent from the Apache Spark User List mailing list archive at
My understanding is the SparkSQL allows one to access Spark data as if it
were stored in a relational database. It compiles SQL queries into a
series of calls to the Spark API.
I need the performance of a SQL database, but I don't care about doing
queries with SQL.
I create the input to MLib by
Hi ,
You can use Redis to store the keys and value as count by doing an update
function whenever you receive that minute key , being an in memory database it
would faster than SQL .You can do an update at the end of each batch to update
the count of the key if it exists or create in case
Spark's API definitely covers all of the things that a relational database
can do. It will probably outperform a relational star schema if all of your
*working* data set can fit into RAM on your cluster. It will still perform
quite well if most of the data fits and some has to spill over to disk.
Just tried the below code and works for me, not sure why is sparkContext
being sent inside the mapPartitions function in your case. Can you try with
simple map() instead of mapPartition?
val ac = sc.accumulator(0)
val or = sc.parallelize(1 to 1)
val ps = or.map(x = (x,x+2)).map(x = ac +=1)
@Peter - as Rick said - Spark's main usage is data analysis and not
storage.
Spark allows you to plugin different storage layers based on your use cases
and quality attribute requirements. So in essence if your relational
database is meeting your storage requirements you should think about how to
Hi,
It is very easy to integrate using Cassandra in a use case such as this. For
instance, do your joins in Spark and do your data storage in Cassandra which
allows a very flexible schema, unlike a relational DB, and is much faster,
fault tolerant, and with spark and colocation WRT data
In Tuning Spark https://spark.apache.org/docs/latest/tuning.html, it says,
Spark automatically includes Kryo serializers for the *many commonly-used
core Scala classes* covered in the AllScalaRegistrar from the Twitter chill
https://github.com/twitter/chill library.
I looked into the
Hi Akhil,
Please see this related message.
http://apache-spark-user-list.1001560.n3.nabble.com/Bug-in-Accumulators-td17263.html
I am curious if this works for you also.
--
View this message in context:
Hi all,
This feels like a dumb question but bespeaks my lack of understanding: what
is the Spark thrift-server for? Especially if there's an existing Hive
installation.
Background:
We want to use Spark to do some processing starting from files (in probably
MapRFS). We want to be able to read the
I wonder if there is any tool to tweak spark (worker and master).
I have 6 workers (192 GB RAM, 32 cores CPU each) with 2 masters and see
only small different between MapReduce from hadoop and Spark.
I've tested word count on 50 GB file. During tests spark hung on 2 nodes for
few minuts with
This is very experimental and mostly unsupported, but you can start the
JDBC server from within your own programs
https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala#L45
by
passing it the HiveContext.
On
I'm a newbie with Spark. After installing it on all the machines I want to use,
do I need to tell it about Hadoop configuration, or will it be able to find it
himself?
Thank you,
I agree with Soumya. A relational database is usually the worst kind of
database to receive a constant event stream.
That said, the best solution is one that already works :)
If your system is meeting your needs, then great. When you get so many
events that your db can't keep up, I'd look into
Debasish Das writes:
If the SVM is not already migrated to BFGS, that's the first thing you should
try...Basically following LBFGS Logistic Regression come up with LBFGS based
linear SVM...
About integrating TRON in mllib, David already has a version of TRON in
breeze
but someone
HI
I am using Spark 1.1.0 config with STANDALONE clusterManager and
CLUSTER deployMode. The logic is I want to submit multi jars with
spark-submit , using the �C-jars optional, I got an ClassNotFoundException
, by the way in my code I also use thread context class loader to load
Any update?
I encountered same issue in my environment.
Here are my steps as usual:
git clone https://github.com/apache/spark
mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -DskipTests clean
package
build successfully by maven.
import into IDEA as a maven project, click Build-Make
Yes it is necessary to do a mvn clean when encountering this issue.
Typically you would have changed one or more of the profiles/options -
which leads to this occurring.
2014-10-22 22:00 GMT-07:00 Ryan Williams ryan.blake.willi...@gmail.com:
I started building Spark / running Spark tests this
I heard from one person offline who regularly builds Spark on OSX and Linux
and they felt like they only ever saw this error on OSX; if anyone can
confirm whether they've seen it on Linux, that would be good to know.
Stephen: good to know re: profiles/options. I don't think changing them is
a
Hi Guys,
Here's some lines from the log file before the OOM. They don't look that
helpful, so let me know if there's anything else I should be sending. I am
running in standalone mode.
spark-pulse-org.apache.spark.deploy.master.Master-1-hadoop10.pulse.io.out.5:java.lang.OutOfMemoryError:
Java
Hey, I'm trying to run TPC-H Query 4 (shown below), and get the following
error:
Exception in thread main java.lang.RuntimeException: [11.25] failure:
``UNION'' expected but `select' found
It seems like Spark SQL doesn't support the exists clause. Is this true?
select
o_orderpriority,
I see the errors regularly on linux under the conditions of having changed
profiles.
2014-10-26 20:49 GMT-07:00 Ryan Williams ryan.blake.willi...@gmail.com:
I heard from one person offline who regularly builds Spark on OSX and
Linux and they felt like they only ever saw this error on OSX; if
Given that you are storing event data (which is basically things that have
happened in the past AND cannot be modified) you should definitely look at
Event sourcing.
http://martinfowler.com/eaaDev/EventSourcing.html
Agreed. In this context: a lesser known fact is that the Lambda
Can you paste the hive-site.xml? Most of times I meet this exception, because
the JDBC driver for hive metastore are not correct set or wrong driver classes
are included in the assembly jar.
As default, the assembly jar contains the derby.jar, which is the embedded
derby JDBC driver.
From:
I have a similar requirement. But instead of grouping it by chunkSize, I
would have the timeStamp be part of the data. So the function I want has
the following signature:
// RDD of (timestamp, value)
def rddToDStream[T](data: RDD[(Long, T)], timeWindow: Long)(implicit ssc:
StreamingContext):
29 matches
Mail list logo