In this case I'd probably just store it as a String. Our casting rules
(which come from Hive) are such that when you use a string as an number of
boolean it will be casted to the desired type.
Thanks for the PR btw :)
On Fri, Mar 27, 2015 at 2:31 PM, Eran Medan ehrann.meh...@gmail.com wrote:
This is something we are hoping to support in Spark 1.4. We'll post more
information to JIRA when there is a design.
On Thu, Mar 26, 2015 at 11:22 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Hi,
Anyone has similar request?
https://issues.apache.org/jira/browse/SPARK-6561
When we
Hi Jao,
Sorry to pop up this old thread. I am have the same problem like you did. I
want to know if you have figured out how to improve k-means on Spark.
I am using Spark 1.2.0. My data set is about 270k vectors, each has about
350 dimensions. If I set k=500, the job takes about 3hrs on my
Hi David,
Can you also try with Spark 1.3 if possible? I believe there was a 2x
improvement on K-Means between 1.2 and 1.3.
Thanks,
Burak
On Sat, Mar 28, 2015 at 9:04 PM, davidshen84 davidshe...@gmail.com wrote:
Hi Jao,
Sorry to pop up this old thread. I am have the same problem like you
I am learning spark sql and try spark-sql example, I running following
code, but I got exception ERROR CliDriver:
org.apache.spark.sql.AnalysisException: cannot recognize input near
'CREATE' 'TEMPORARY' 'TABLE' in ddl statement; line 1 pos 17, I have two
questions,
1. Do we have a list of the
I am learning spark sql and try spark-sql example, I running following
code, but I got exception ERROR CliDriver:
org.apache.spark.sql.AnalysisException: cannot recognize input near
'CREATE' 'TEMPORARY' 'TABLE' in ddl statement; line 1 pos 17, I have two
questions,
1. Do we have a list of the
The input file is of format: userid, movieid, rating
From this plan, I want to extract all possible combinations of movies and
difference between the ratings for each user.
(movie1, movie2),(rating(movie1)-rating(movie2))
This process should be processed for each user in the dataset. Finally, I
Please take a look at
https://spark.apache.org/docs/latest/sql-programming-guide.html
Cheers
On Mar 28, 2015, at 5:08 AM, Vincent He vincent.he.andr...@gmail.com wrote:
I am learning spark sql and try spark-sql example, I running following code,
but I got exception ERROR CliDriver:
Hi all,
I am working with spark 1.0.0. mainly for the usage of GraphX and wished to
apply some custom partitioning strategies on the edge list of the graph.
I have generated an edge list file which has the partition number after the
source and destination id in each line. Initially I am loading
See
https://databricks.com/blog/2015/03/24/spark-sql-graduates-from-alpha-in-spark-1-3.html
I haven't tried the SQL statements in above blog myself.
Cheers
On Sat, Mar 28, 2015 at 5:39 AM, Vincent He vincent.he.andr...@gmail.com
wrote:
thanks for your information . I have read it, I can run
thanks for your information . I have read it, I can run sample with scala
or python, but for spark-sql shell, I can not get an exmaple running
successfully, can you give me an example I can run with ./bin/spark-sql
without writing any code? thanks
On Sat, Mar 28, 2015 at 7:35 AM, Ted Yu
Thanks for the follow-up, Dale.
bq. hdp 2.3.1
Minor correction: should be hdp 2.1.3
Cheers
On Sat, Mar 28, 2015 at 2:28 AM, Johnson, Dale daljohn...@ebay.com wrote:
Actually I did figure this out eventually.
I’m running on a Hortonworks cluster hdp 2.3.1 (hadoop 2.4.1). Spark
bundles
Hi All,
I'm facing performance issues with spark implementation, and was briefly
investigating on WebUI logs, i noticed that my RDD size is 55GB the
Shuffle Write is 10 GB Input Size is 200GB. Application is a web
application which does predictive analytics, so we keep most of our data in
Hi,
I’ve been trying to use Spark Streaming for my real-time analysis
application using the Kafka Stream API on a cluster (using the yarn
version) of 6 executors with 4 dedicated cores and 8192mb of dedicated
RAM.
The thing is, my application should run 24/7 but the disk usage is
leaking. This
It's worth adding that there's no guaranteed that re-evaluated work would be on
the same host as before, and in the case of node failure, it is not guaranteed
to be elsewhere.
this means things that depend on host-local information is going to generate
different numbers even if there are no
Hi, did you resolve this issue or just work around it be keeping your
application jar local? Running into the same issue with 1.3.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-submit-not-working-when-application-jar-is-in-hdfs-tp21840p22272.html
Looking at SparkSubmit#addJarToClasspath():
uri.getScheme match {
case file | local =
...
case _ =
printWarning(sSkip remote jar $uri.)
It seems hdfs scheme is not recognized.
FYI
On Thu, Feb 26, 2015 at 6:09 PM, dilm dmend...@exist.com wrote:
I'm trying to run a
Hi Ankur
If your hardware is ok, looks like it is config problem. Can you show me
the config of spark-env.sh or JVM config?
Thanks
Wisely Chen
2015-03-28 15:39 GMT+08:00 Ankur Srivastava ankur.srivast...@gmail.com:
Hi Wisely,
I have 26gb for driver and the master is running on m3.2xlarge
I've also been having trouble running 1.3.0 on HDP. The
spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0-2041
configuration directive seems to work with pyspark, but not propagate
when using spark-shell. (That is, everything works find with pyspark,
and spark-shell fails with the bad
Note that speculation is off by default to avoid these kinds of unexpected
issues.
On Sat, Mar 28, 2015 at 6:21 AM, Steve Loughran ste...@hortonworks.com
wrote:
It's worth adding that there's no guaranteed that re-evaluated work would
be on the same host as before, and in the case of node
got it thanks. Making sure everything is idempotent is definitely a
critical piece for peace of mind.
On Sat, Mar 28, 2015 at 1:47 PM, Aaron Davidson ilike...@gmail.com wrote:
Note that speculation is off by default to avoid these kinds of unexpected
issues.
On Sat, Mar 28, 2015 at 6:21 AM,
You are hitting https://issues.apache.org/jira/browse/SPARK-6330. It has
been fixed in 1.3.1, which will be released soon.
On Fri, Mar 27, 2015 at 10:42 PM, sud_self 852677...@qq.com wrote:
spark version is 1.3.0 with tanhyon-0.6.1
QUESTION DESCRIPTION:
I have put more detail of my problem at
http://stackoverflow.com/questions/29295420/spark-kmeans-computation-cannot-be-distributed
It is really appreciate if you can help me take a look at this problem. I
have tried various settings and ways to load/partition my data, but I just
cannot get rid
Could someone please share the spark-submit command that shows their mysql
jar containing driver class used to connect to Hive MySQL meta store.
Even after including it through
--driver-class-path
/home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar
OR (AND)
--jars
This is from my Hive installation
-sh-4.1$ ls /apache/hive/lib | grep derby
derby-10.10.1.1.jar
derbyclient-10.10.1.1.jar
derbynet-10.10.1.1.jar
-sh-4.1$ ls /apache/hive/lib | grep datanucleus
datanucleus-api-jdo-3.2.6.jar
datanucleus-core-3.2.10.jar
datanucleus-rdbms-3.2.9.jar
Hi
In broadcast, spark will collect the whole 3gb object into master node and
broadcast to each slaves. It is very common situation that the master node
don't have enough memory .
What is your master node settings?
Wisely Chen
Ankur Srivastava ankur.srivast...@gmail.com 於 2015年3月28日 星期六寫道:
I
Yes am using yarn-cluster and i did add it via --files. I get Suitable
error not found error
Please share the spark-submit command that shows mysql jar containing
driver class used to connect to Hive MySQL meta store.
Even after including it through
--driver-class-path
This is what am seeing
./bin/spark-submit -v --master yarn-cluster --driver-class-path
/apache/hadoop/share/hadoop/common/hadoop-common-2.4.1-EBAY-2.jar:/apache/hadoop/lib/hadoop-lzo-0.6.0.jar:/apache/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/yarn/lib/guava-11.0.2.jar
--jars
I tried with a different version of driver but same error
./bin/spark-submit -v --master yarn-cluster --driver-class-path
Actually I did figure this out eventually.
I’m running on a Hortonworks cluster hdp 2.3.1 (hadoop 2.4.1). Spark bundles
the org/apache/hadoop/hdfs/… classes along with the spark-assembly jar. This
turns out to introduce a small incompatibility with hdp 2.3.1. I carved these
classes out of
How many dimensions does your data have? The size of the k-means model is k
* d, where d is the dimension of the data.
Since you're using k=1000, if your data has dimension higher than say,
10,000, you will have trouble, because k*d doubles have to fit in the
driver.
Reza
On Sat, Mar 28, 2015
My vector dimension is like 360 or so. The data count is about 270k. My
driver has 2.9G memory. I attache a screenshot of current executor status.
I submitted this job with --master yarn-cluster. I have a total of 7
worker node, one of them acts as the driver. In the screenshot, you can see
all
32 matches
Mail list logo