Oh, forgot to add the managed libraries and the Hive libraries within the
CLASSPATH. As soon as I did that, we’re good to go now.
On August 29, 2014 at 22:55:47, Denny Lee (denny.g@gmail.com) wrote:
My issue is similar to the issue as noted
I'm following the latest documentation on configuring a cluster on ec2
(http://spark.apache.org/docs/latest/ec2-scripts.html). Running
./spark-ec2 -k Blah -i .ssh/Blah.pem -s 2 launch spark-ec2-test
gets a generic timeout error that's coming from
File ./spark_ec2.py, line 717, in real_main
I want to let hive run on spark and yarn clusters,Hive Metastore is stored
in MySQL
I compiled spark code:
sh make-distribution.sh --hadoop 2.4.1 --with-yarn --skpi-java-test --tgz
--with-hive
My HQL code:
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql._
import
I'm no expert. But as I understand, yes you create multiple streams to
consume multiple partitions in parallel. If they're all in the same
Kafka consumer group, you'll get exactly one copy of the message so
yes if you have 10 consumers and 3 Kafka partitions I believe only 3
will be getting
Hi Michael,
Thank you so much!!
I have tried to change the following key length from 256 to 255 and from 767 to
766, it still didn’t work
alter table COLUMNS_V2 modify column COMMENT VARCHAR(255);
alter table INDEX_PARAMS modify column PARAM_KEY VARCHAR(255);
alter table SD_PARAMS modify column
It bundles all these src's
https://github.com/apache/spark/tree/master/examples together and also it
uses the pom file to get the dependencies list if I'm not wrong.
Thanks
Best Regards
On Fri, Aug 29, 2014 at 12:39 AM, filipus floe...@gmail.com wrote:
hey guys
i still try to get used to
bq. how was the spark...example...jar file build?
You can use the following command to build against hadoop 2.4:
mvn -Phadoop-2.4,yarn -Dhadoop.version=2.4.1 -DskipTests clean package
examples jar can be found under examples/target
Cheers
On Sat, Aug 30, 2014 at 6:54 AM, Akhil Das
i try to get use to sbt in order to build stnd allone application by myself
the example SimpleApp i managed to run
than i tried to copy some example scala program like LinearRegression in a
local directory
.
./build.sbt
./src
./src/main
./src/main/scala
./src/main/scala/LinearRegression.scala
When programming in Hadoop it is possible to guarantee
1) All keys sent to a specific partition will be handled by the same
machine (thread)
2) All keys received by a specific machine (thread) will be received in
sorted order
3) These conditions will hold even if the values associated with a
compilation works but execution not at least with spark-submit as I described
above
when I make a local copy of the training set I can execute sbt run file
which works
sbt run sample_linear_regression_data.txt
when I do
sbt run ~/git/spark/data/mllib/sample_linear_regression_data.txt
the
Did you run sbt under /home/filip/spark-ex-regression ?
'~/git/spark/data/mllib/sample_linear_regression_data.txt' was interpreted
as rooted under /home/filip/spark-ex-regression
Cheers
On Sat, Aug 30, 2014 at 9:28 AM, filipus floe...@gmail.com wrote:
compilation works but execution not at
ok I see :-)
.. instead of ~ works fine so
do you know the reason
sbt run [options] works
after sbt package
but
spark-submit --class ClassName --master local[2]
target/scala/JarPackage.jar [options]
doesnt?
it cannot resolve everything somehow
--
View this message in context:
In 1.1, you'll be able to get all of these properties using sortByKey, and then
mapPartitions on top to iterate through the key-value pairs. Unfortunately
sortByKey does not let you control the Partitioner, but it's fairly easy to
write your own version that does if this is important.
In
Oh, you may be running into an issue with your MySQL setup actually, try running
alter database metastore_db character set latin1
so that way Hive (and the Spark HiveContext) can execute properly against the
metastore.
On August 29, 2014 at 04:39:01, arthur.hk.c...@gmail.com
Hi,
Already done but still get the same error:
(I use HIVE 0.13.1 Spark 1.0.2, Hadoop 2.4.1)
Steps:
Step 1) mysql:
alter database hive character set latin1;
Step 2) HIVE:
hive create table test_datatype2 (testbigint bigint );
OK
Time taken: 0.708 seconds
hive drop table test_datatype2;
Hi,
I have few questions about Spark Master and Slave setup:
Here, I have 5 Hadoop nodes (n1, n2, n3, n4, and n5 respectively), at the
moment I run Spark under these nodes:
n1:Hadoop Active Name node, Hadoop Slave
Spark Active Master
Hi,
Is there any formula to calculate proper RAM allocation values for Spark and
Shark based on Physical RAM, HADOOP and HBASE RAM usage?
e.g. if a node has 32GB physical RAM
spark-defaults.conf
spark.executor.memory ?g
spark-env.sh
export SPARK_WORKER_MEMORY=?
export
I'm using CDH 5.1.0, which bundles Spark 1.0.0 with it.
Following How-to: Run a Simple Apache Spark App in CDH 5
http://blog.cloudera.com/blog/2014/04/how-to-run-a-simple-apache-spark-app-in-cdh-5/
, I tried to submit my job in local mode, Spark Standalone mode and YARN
mode. I successfully
I have this same question. Isn't there somewhere that the Kafka range
metadata can be saved? From my naive perspective, it seems like it should
be very similar to HDFS lineage. The original HDFS blocks are kept
somewhere (in the driver?) so that if an RDD partition is lost, it can be
Hi,
Could you please add Asiainfo to the Powered By Spark page?
Thanks
Asiainfo
www.asiainfo.com
Core, SQL, Streaming, MLlib, GraphX
We leverage Spark and Hadoop ecosystem to build cost effective data center
solution for our customer in teleco industry as well as other industrial
sectors.
couple things to add here:
1) you can import the
org.apache.spark.streaming.dstream.PairDStreamFunctions implicit which adds
a whole ton of functionality to DStream itself. this lets you work at the
DStream level versus digging into the underlying RDDs.
2) you can use ssc.fileStream(directory)
Reading about RDD Persistency
https://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence,
I
learned that the storage level MEMORY_AND_DISK means that Store RDD as
deserialized Java objects in the JVM. If the RDD does not fit in memory,
store the partitions that don't fit on disk,
you can view the Locality Level of each task within a stage by using the
Spark Web UI under the Stages tab.
levels are as follows (in order of decreasing desirability):
1) PROCESS_LOCAL - data was found directly in the executor JVM
2) NODE_LOCAL - data was found on the same node as the executor
I'd be interested to understand this mechanism as well. But this is the
error recovery part of the equation. Consuming from Kafka has two aspects -
parallelism and error recovery and I am not sure how either works. For
error recovery, I would like to understand how:
- A failed receiver gets
24 matches
Mail list logo