Hi MIssie,
In the Java API, you should consider:
1. RDD.map
https://spark.apache.org/docs/latest/api/java/org/apache/spark/rdd/RDD.html#map(scala.Function1,%20scala.reflect.ClassTag)
to
transform the text
2. RDD.sortBy
Hi,
What happens if the master node fails in the case of Spark Streaming? Would
the data be lost?
Thanks,
Swetha
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Regarding-master-node-failure-tp23701.html
Sent from the Apache Spark User List mailing list
Stati,
Change SPARK_REPL_OPTS to SPARK_SUBMIT_OPTS and try again. I faced the same
issue and making this change worked for me. I looked at the spark-shell
file under the bin dir and found SPARK_SUBMIT_OPTS being used.
SPARK_SUBMIT_OPTS=-XX:MaxPermSize=256m bin/spark-shell --master
See this thread: http://search-hadoop.com/m/q3RTtxVUrL1AvnPj2
On Tue, Jul 7, 2015 at 10:04 AM, Lincoln Atkinson lat...@microsoft.com
wrote:
I’m trying to build Spark from source on Windows 8.1, using a recent
Cygwin install and JDK 8u45. From the root of my enlistment, I’m running
Hi All,
I am working with dataframes and have been struggling with this thing, any
pointers would be helpful.
I've a Json file with the schema like this,
links: array (nullable = true)
||-- element: struct (containsNull = true)
|||-- desc: string (nullable = true)
|||--
Looks like a workaround has gone in:
[SPARK-8819] Fix build for maven 3.3.x
FYI
On Tue, Jul 7, 2015 at 10:09 AM, Ted Yu yuzhih...@gmail.com wrote:
See this thread: http://search-hadoop.com/m/q3RTtxVUrL1AvnPj2
On Tue, Jul 7, 2015 at 10:04 AM, Lincoln Atkinson lat...@microsoft.com
wrote:
bq. Need I specify my spark version
Looks like the build used 1.4.0 SNAPSHOT. Please use 1.4.0 release.
Cheers
On Mon, Jul 6, 2015 at 11:50 PM, luohui20...@sina.com wrote:
Hi grace,
recently I am trying Hibench to evaluate my spark cluster, however I
got a problem in building Hibench,
I'm trying to build Spark from source on Windows 8.1, using a recent Cygwin
install and JDK 8u45. From the root of my enlistment, I'm running `build/mvn
-Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package`
The build moves along just fine for a while, until it builds Spark
Thanks,
I tried that, and the result was the same.
I still can start a master from the spark-1.4.0-bin-hadoop2.4 pre-built
version thought
I don't really know what to show more than the strace that I already
linked, so I could use any hint for that.
--
Henri Maxime Demoulin
2015-07-07 9:53
That solved it. Thanks!
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Tuesday, July 07, 2015 10:21 AM
To: Lincoln Atkinson
Cc: user@spark.apache.org
Subject: Re: Windows - endless Dependency-reduced POM written... in Bagel
build
Looks like a workaround has gone in:
[SPARK-8819] Fix build
Hi I tried both approach using df. repartition(6) and df.coalesce(6) it
doesn't reduce part-x files. Even after calling above method I still
see around 200 small part files of size 20 mb each which is again orc files.
On Tue, Jul 7, 2015 at 12:52 AM, Sathish Kumaran Vairavelu
I found this post back in March 2014.
http://apache-spark-user-list.1001560.n3.nabble.com/Incrementally-add-remove-vertices-in-GraphX-td2227.html
I was wondering if there is any progress on GraphX Streaming/incremental
graph update in GraphX. Or is there a place where I can track the progress
on
Hello,
reading from spark-csv, got some lines with missing data (not invalid).
applying map() to create a LabeledPoint with denseVector. Using map( Row =
Row.getDouble(col_index) )
To this point:
res173: org.apache.spark.mllib.regression.LabeledPoint =
This talk may help -
https://spark-summit.org/2015/events/recipes-for-running-spark-streaming-applications-in-production/
On Tue, Jul 7, 2015 at 9:51 AM, swetha swethakasire...@gmail.com wrote:
Hi,
What happens if the master node fails in the case of Spark Streaming? Would
the data be lost?
Hi grace, recently I am trying Hibench to evaluate my spark cluster,
however I got a problem in building Hibench, would you help to take a look?
thanks. It fails at building Sparkbench, and you may check the attched pic
for more info. My spark version :1.3.1,hadoop version :2.7.0
Strange. What are you having in $SPARK_MASTER_IP? It may happen that it is
not able to bind to the given ip but again it should be in the logs.
Thanks
Best Regards
On Tue, Jul 7, 2015 at 12:54 AM, maxdml maxdemou...@gmail.com wrote:
Hi,
I've been compiling spark 1.4.0 with SBT, from the
Did you try kryo? Wrap everything with kryo and see if you are still
hitting the exception. (At least you could see a different exception stack).
Thanks
Best Regards
On Tue, Jul 7, 2015 at 6:05 AM, Yana Kadiyska yana.kadiy...@gmail.com
wrote:
Hi folks, suffering from a pretty strange issue:
Dear all,
We've tried to use sparkSql to do some insert from A table to B table
action where using the exact same SQL script,
hive is able to finish it but Spark 1.3.1 would always end with OOM issue;
we tried several configuration including:
--executor-cores 2
--num-executors 300
Hi,
Where did OOM happened?
In Driver or executor?
Sometimes SparkSQL Driver OOM on tables with large number partitions.
If so, you might want to increase it in spark-defaults.conf
spark.driver.memory
Shawn
On Jul 7, 2015, at 3:58 PM, shsh...@tsmc.com wrote:
Dear all,
We've tried to
Any Response?
2015-07-06 12:28 GMT+08:00 Tao Li litao.bupt...@gmail.com:
Node cloud10141049104.wd.nm.nop.sogou-op.org and
cloud101417770.wd.nm.ss.nop.sogou-op.org failed too many times, I want to
know if it can be auto offline when failed too many times?
2015-07-06 12:25 GMT+08:00 Tao
Just trying to get started with Spark and attempting to use HiveContext using
spark-shell to interact with existing Hive tables on my CDH cluster but keep
running into the errors (pls see below) when I do 'hiveContext.sql(show
tables)'. Wanted to know what all JARs need to be included to have this
Hi I am new to Apache Spark and I have tried to query hive tables using
Apache Spark Sql. First I have tried it in Spark-shell where I can query 1
lakh records from hive table within a second. Then I have tried in a java
code which always take more than 10 seconds and I have noted that each time
You can just use `--files` and I think it should work. Let us know on
https://issues.apache.org/jira/browse/SPARK-6833 if it doesn't work as
expected.
Thanks
Shivaram
On Tue, Jul 7, 2015 at 5:13 AM, Michał Zieliński
zielinski.mich...@gmail.com wrote:
Hi all,
*spark-submit* for Python and
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.hive.HiveContext
I'm getting org.apache.spark.sql.catalyst.analysis.NoSuchTableException
from:
val dataframe = hiveContext.table(other_db.mytable)
Do I have to change current database to access it? Is it possible to
See this thread http://search-hadoop.com/m/q3RTt0NFls1XATV02
Cheers
On Tue, Jul 7, 2015 at 11:07 AM, Arun Luthra arun.lut...@gmail.com wrote:
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.hive.HiveContext
I'm getting
In Spark Streaming, when using updateStateByKey, it requires the generated
DStream to be checkpointed.
It seems that it always use JavaSerializer, no matter what I set for
spark.serializer. Can I use KryoSerializer for checkpointing? If not, I
assume the key and value types have to be
Hi again,
Ok, now I do not know of any way to fix the problem other then delete the
bad machine from the config + restart .. And you will need admin
privileges on cluster for that :(
However, before we give up on the speculative execution, I suspect that the
task is being run again and again on
You probably want to explode the array to produce one row per element:
df.select(explode(df(links)).alias(link))
On Tue, Jul 7, 2015 at 10:29 AM, Naveen Madhire vmadh...@umail.iu.edu
wrote:
Hi All,
I am working with dataframes and have been struggling with this thing, any
pointers would be
Is there more documentation on what is needed to setup BLAS/LAPACK native
suport with Spark.
I’ve built spark with the -Pnetlib-lgpl flag and see that the netlib
classes are in the assembly jar.
jar tvf spark-assembly-1.5.0-SNAPSHOT-hadoop2.6.0.jar | grep netlib |
grep Native
6625 Tue Jul 07
bq. my class has already implemented the java.io.Serializable
Can you show the code for Model.User class ?
Cheers
On Tue, Jul 7, 2015 at 8:18 AM, Hafsa Asif hafsa.a...@matchinguu.com
wrote:
Thank u so much for the solution. I run the code like this,
JavaRDDUser rdd =
SIGTERM on YARN generally means the NM is killing your executor because
it's running over its requested memory limits. Check your NM logs to make
sure. And then take a look at the memoryOverhead setting for driver and
executors (http://spark.apache.org/docs/latest/running-on-yarn.html).
On Tue,
Hi.
I am just wondering if the rdd was actually modified.
Did you test it by printing rdd.partitions.length before and after?
Regards,
Gylfi.
--
View this message in context:
Sorry, I can't help with this issue, but if you are interested in a simple
way to launch a Spark cluster on Amazon, Spark is now offered as an
application in Amazon EMR. With this you can have a full cluster with a
few clicks:
https://aws.amazon.com/blogs/aws/new-apache-spark-on-amazon-emr/
-
When you enable checkpointing by setting the checkpoint directory, you
enable metadata checkpointing. Data checkpointing kicks in only if you are
using a DStream operation that requires it, or you are enabling Write Ahead
Logs to prevent data loss on driver failure.
More discussion -
I'm following the tutorial about Apache Spark on EC2. The output is the
following:
$ ./spark-ec2 -i ../spark.pem -k spark --copy launch spark-training
Setting up security groups...
Searching for existing cluster spark-training...
Latest Spark AMI: ami-19474270
Launching
I am trying to use the posexplode function in the HiveContext to
auto-generate a sequence number. This feature is supposed to be available
Hive 0.13.0.
SELECT name, phone FROM contact LATERAL VIEW
posexplode(phoneList.phoneNumber) phoneTable AS pos, phone
My test program failed with the
Hi,
Did you try to reduce number of executors and cores? usually num-executors *
executor-cores = number of parallel tasks, so you can reduce number of
parallel tasks in command line like
./bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn-cluster \
--num-executors
Hi,
I am seeing a lot of posts on singletons vs. broadcast variables, such as
*
http://apache-spark-user-list.1001560.n3.nabble.com/Best-way-to-have-some-singleton-per-worker-tt20277.html
*
I want to add spark-hive as a dependence to submit my job, but it seems that
spark-submit can not resolve it.
$ ./bin/spark-submit \
→ --packages
org.apache.spark:spark-hive_2.10:1.4.0,org.postgresql:postgresql:9.3-1103-jdbc3,joda-time:joda-time:2.8.1
\
→ --class
I think the properties that you have in your hdfs-site.xml should go in the
core-site.xml (at least for the namenode.name and datanote.data ones). I
might be wrong here, but that's what I have in my setup.
you should also add hadoop.tmp.dir in your core-site.xml. That might be the
source of your
Thank u so much for the solution. I run the code like this,
JavaRDDUser rdd = context.parallelize(usersList);
JavaRDDUser rdd_sorted_users= rdd.sortBy(new FunctionUser,String(){
@Override
public String call(User usr1) throws Exception {
Would it be possible to have a wrapper class that just represents a
reference to a singleton holding the 3rd party object? It could proxy over
calls to the singleton object which will instantiate a private instance of
the 3rd party object lazily? I think something like this might work if the
Hi Ratio -
You need more than just hive-jdbc jar.
Here are all of the jars that I found were needed. I got this list from
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-RunningtheJDBCSampleCode
plus trial and error.
[image: Inline image 1]
-- Eric
On
Did you do
yourRdd.coalesce(6).saveAsTextFile()
or
yourRdd.coalesce(6)
yourRdd.saveAsTextFile()
?
Srikanth
On Tue, Jul 7, 2015 at 12:59 PM, Umesh Kacha umesh.ka...@gmail.com wrote:
Hi I tried both approach using df. repartition(6) and
Hi Can you help me how to load data from s3 bucket to redshift , if you gave
sample code can you pls send me
Thanks su
Say I have a spark job that looks like following:
def loadTable1() {
val table1 = sqlContext.jsonFile(ss3://textfiledirectory/)
table1.cache().registerTempTable(table1)}
def loadTable2() {
val table2 = sqlContext.jsonFile(ss3://testfiledirectory2/)
Why does this not work? Is insert into broken in 1.3.1?
val ssc = new StreamingContext(sc, Minutes(10))
val currentStream = ssc.textFileStream(ss3://textFileDirectory/)
val dayBefore = sqlContext.jsonFile(ss3://textFileDirectory/)
dayBefore.saveAsParquetFile(/tmp/cache/dayBefore.parquet)
val
I know the typical way to apply a hive UDF to a dataframe is basically
something like:
dataframe.selectExpr(reverse(testString) as reversedString)
Is there a way to apply the hive UDF just to a single row and get a row
back? Something like:
dataframe.first.selectExpr(reverse(testString) as
spark-hive is excluded when using --packages, because it can be included in
the spark-assembly by adding -Phive during mvn package or sbt assembly.
Best,
Burak
On Tue, Jul 7, 2015 at 8:06 AM, Hao Ren inv...@gmail.com wrote:
I want to add spark-hive as a dependence to submit my job, but it
Anand,
AFAIK, you will need to change two settings:
spark.streaming.unpersist = false // in order for SStreaming to not drop
the raw RDD data
spark.cleaner.ttl = some reasonable value in seconds
Also be aware that the lineage of your union RDD will grow with each batch
interval. You will need
Hi all,
*spark-submit* for Python and Java/Scala has *--py-files* and *--jars*
options for submitting additional files on top of the main application. Is
there any such option for *sparkr-submit*? I know that there is
*includePackage()
*R function to add library dependencies, but can you add
Interesting, thanks for the heads up.
On 7/6/15, 7:19 PM, Davies Liu dav...@databricks.com wrote:
Currently, Python UDFs run in a Python instances, are MUCH slower than
Scala ones (from 10 to 100x). There is JIRA to improve the
performance: https://issues.apache.org/jira/browse/SPARK-8632, After
Hi all,
I am fairly new to spark and wonder if you can help me. I am exploring
GraphX/Spark by running the pagerank example on a medium size graph (12 GB)
using this command:
My cluster is 1+16 machines, the master has 15 GB memory and each worker
has 30 GB. The master has 2 cores and each
spark.streaming.unpersist = false // in order for SStreaming to not drop the
raw RDD data
spark.cleaner.ttl = some reasonable value in seconds
why is the above suggested provided the persist/vache operation on the
constantly unioniuzed Batch RDD will have to be invoked anyway (after every
Can you try adding sc.stop at the end of your program? looks like its
having a hard-time closing off sparkcontext.
Thanks
Best Regards
On Tue, Jul 7, 2015 at 4:08 PM, Hafsa Asif hafsa.a...@matchinguu.com
wrote:
Hi,
I run the following simple Java spark standalone app with maven command
Here's a simplified example:
SparkConf conf = new SparkConf().setAppName(
Sigmoid).setMaster(local);
JavaSparkContext sc = new JavaSparkContext(conf);
ListString user = new ArrayListString();
user.add(Jack);
user.add(Jill);
I have also tried this stupid code snippet, only thinking that it may even
compile code
Function1User, Object FILTER_USER = new AbstractFunction1User, Object ()
{
public Object apply(User user){
return user;
}
};
FILTER_USER is fine but cannot be applied to the
http://www.meetup.com/Cincinnati-Apache-Spark-Meetup/
Thanks.
Darin.
Evo,
I'd let the OP clarify the question. I'm not in position of clarifying his
requirements beyond what's written on the question.
Regarding window vs mutable union: window is a well-supported feature that
accumulates messages over time. The mutable unioning of RDDs is bound to
operational
I would suggest you take alook to DataFrames. Also, I do not think you
should implement comparators for user class as a whole, rather you should
get the attribute to sort/compar on and delete sorting to data type of
inherent attribute. Eg. sorting can be done by name and if so, it should be
string
Yes, I do set $SPARK_MASTER_IP. I suspect a more internal issue, maybe
due to multiple spark/hdfs instances having successively run on the same
machine?
--
Henri Maxime Demoulin
2015-07-07 4:10 GMT-04:00 Akhil Das ak...@sigmoidanalytics.com:
Strange. What are you having in $SPARK_MASTER_IP? It
I am still receiving these weird sigterms on the executors. The driver claims
it lost the executor, the executor receives a SIGTERM (from whom???)
It doesn't seem a memory related issue though increasing memory takes the
job a bit further or completes it. But why? there is no memory pressure on
Hi,
I have an object list of Users and I want to implement top() and filter()
methods on the object list. Let me explain you the whole scenario:
1. I have User object list named usersList. I fill it during record set.
User user = new User();
Hi, all
I found an Exception when using spark-sql
java.lang.UnsatisfiedLinkError: Native Library
/data/lib/native/libgplcompression.so already loaded in another classloader ...
I set spark.sql.hive.metastore.jars=. in file spark-defaults.conf
It does not happen every time. Who knows
Hi,
I run the following simple Java spark standalone app with maven command
exec:java -Dexec.mainClass=SimpleApp
public class SimpleApp {
public static void main(String[] args) {
System.out.println(Reading and Connecting with Spark.);
try {
String logFile =
Hi MorEru,
same problem occurred to. i had to change the version of maven dependency
from spark_core_2.11 to spark_core_2.10 and it worked.
Thanks
Himanshu
--
View this message in context:
Hi,
Suppose I have an RDD that is loaded from some file and then I also have a
DStream that has data coming from some stream. I want to keep union some of
the tuples from the DStream into my RDD. For this I can use something like
this:
var myRDD: RDD[(String, Long)] = sc.fromText...
Hi,
I run the following simple Java spark standalone app with maven command
exec:java -Dexec.mainClass=SimpleApp
public class SimpleApp {
public static void main(String[] args) {
System.out.println(Reading and Connecting with Spark.);
try {
String logFile =
Thank u for your quick response. But, I tried this and get the error as shown
in pic error.jpg
http://apache-spark-user-list.1001560.n3.nabble.com/file/n23676/error.jpg
--
View this message in context:
Did you enable the dynamic resource allocation ? You can refer to this page
for how to configure spark shuffle service for yarn.
https://spark.apache.org/docs/1.4.0/job-scheduling.html
On Tue, Jul 7, 2015 at 10:55 PM, roy rp...@njit.edu wrote:
we tried --master yarn-client with no different
dataframe.limit(1).selectExpr(xxx).collect()?
-Original Message-
From: chrish2312 [mailto:c...@palantir.com]
Sent: Wednesday, July 8, 2015 6:20 AM
To: user@spark.apache.org
Subject: Hive UDFs
I know the typical way to apply a hive UDF to a dataframe is basically
something like:
Hi Shawn,
Thank alot that's actually the last parameter we overlooked!!
I'm able to run the same sql on spark now if I set the spark.driver.memoory
larger,
thanks again!!
--
Best Regards,
Felicia Shann
單師涵
+886-3-5636688 Ext. 7124300
|-
|Xiaoyu
I'm writing a streaming application and want to use spark-submit to submit
it to a YARN cluster. I'd like to submit it in a client node and exit
spark-submit after the application is running. Is it possible?
Hi Hui,
Could you please add more descriptions (about the failure) in HiBench github
Issues?
HiBench works with spark 1.2 and above.
Thank you Best Regards,
Grace (Huang Jie)
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Wednesday, July 8, 2015 12:50 AM
To: 罗辉
Cc: user; Huang, Jie
Subject:
Hi,
I have done a lot of EMR-S3-Redshift using Redshift COPY, haven't done
any from Spark yet but I plan on doing it soon and have been doing some
research. Take a look at this article - Best Practices for Micro-Batch
Loading on Amazon Redshift
Hi, everyone!
I've got key,value pair in form of LongWritable, Text, where I used the
following code:
SparkConf conf = new SparkConf().setAppName(MapReduceFileInput);
JavaSparkContext sc = new JavaSparkContext(conf);
Configuration confHadoop = new Configuration();
JavaPairRDDLongWritable,Text
Hi, everyone!
I've got key,value pair in form of LongWritable, Text, where I used the
following code:
SparkConf conf = new SparkConf().setAppName(MapReduceFileInput);
JavaSparkContext sc = new JavaSparkContext(conf);
Configuration confHadoop = new Configuration();
JavaPairRDDLongWritable,Text
Please take a look at core/src/test/java/org/apache/spark/JavaAPISuite.java
in source code.
Cheers
On Tue, Jul 7, 2015 at 7:17 AM, 付雅丹 yadanfu1...@gmail.com wrote:
Hi, everyone!
I've got key,value pair in form of LongWritable, Text, where I used
the following code:
SparkConf conf = new
Right, I figured I'd need a custom partitioner from what I've read around!
Documentation on this is super sparse; do you have any recommended links on
solving data skew and/or creating custom partitioners in Spark 1.4?
I'd also love to hear if this is an unusual problem with my type of set-up -
I tried also sc.stop(). Sorry I didnot include that in my question, but still
getting thread exception. It is also need to mention that I am working on VM
Machine.
15/07/07 06:00:32 ERROR ActorSystemImpl: Uncaught error from thread
[sparkDriver-akka.actor.default-dispatcher-5]
Requirements – then see my abstracted interpretation – what else do you need in
terms of Requirements …:
“Suppose I have an RDD that is loaded from some file and then I also have a
DStream that has data coming from some stream. I want to keep union some of the
tuples from the DStream into
core-site.xml
configuration
property
namefs.default.name/name
valuehdfs://localhost:9000/value
/property
/configuration
hdfs_site.xml -
configuration
property
namedfs.replication/name
value1/value
/property
property
namedfs.namenode.name.dir/name
it seems it is hardcoded in ExecutorRunnable.scala :
val commands = prefixEnv ++ Seq(
YarnSparkHadoopUtil.expandEnvironment(Environment.JAVA_HOME) +
/bin/java,
-server,
// Kill if OOM is raised - leverage yarn's failure handling to cause
rescheduling.
// Not killing the
Can you try renaming the ~/.ivy2 file to ~/.ivy2_backup and build
spark1.4.0 again and run it?
Thanks
Best Regards
On Tue, Jul 7, 2015 at 6:27 PM, Max Demoulin maxdemou...@gmail.com wrote:
Yes, I do set $SPARK_MASTER_IP. I suspect a more internal issue, maybe
due to multiple spark/hdfs
I get a suspicious sigterm on the executors that doesnt seem to be from the
driver. The other thing that might send a sigterm is the
-XX:OnOutOfMemoryError=kill %p java arg that the executor starts with. Now
my tasks dont seem to run out of mem, so how can I disable this param to
debug them?
--
Hi Himanshu,
I am using spark_core_2.10 in my maven dependency. There were no issues with
that.
The problem I had with this was that the spark master was running on
localhost inside the vm and the slave was not able to connect it.
I changed the spark master to run on the private IP address
Hi,
I have CDH 5.4 installed on a linux server. It has 1 cluster in which spark
is deployed as a history server.
I am trying to connect my laptop to the spark history server.
When I run spark-shell master ip: port number I get the following output
How can I verify that the worker is connected to
spark-submit is nothing but a process in your OS, so you should be able to
submit it in background and exit. However, your spark-submit process itself
is the driver for your spark streaming application, so it will not exit for
the lifetime of the streaming app.
On Wed, Jul 8, 2015 at 1:13 PM, Bin
Hi Ashish,
Are you running Spark-on-YARN on the cluster with an instance of Spark History
server?
Also if you are using Cloudera Manager and using Spark on YARN, spark on yarn
service has a link for the history server web UI.
Can you paste the command and the output you are seeing in the
Thank you Ayan for your response.. But I have just realised that the Spark
is configured to be a history server.
Please, can somebody suggest to me how can I convert Spark history server
to be a Master server?
Thank you
Sincerely,
Ashish Dutt
On Wed, Jul 8, 2015 at 12:28 PM, ayan guha
Hello Guru,
Thank you for your quick response.
This is what i get when I try executing spark-shell master ip:port number
C:\spark-1.4.0\binspark-shell master IP:18088
log4j:WARN No appenders could be found for logger
(org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please
It worked Zhou.
On Mon, Jul 6, 2015 at 10:43 PM, Wei Zhou zhweisop...@gmail.com wrote:
I userd val output: RDD[(DetailInputRecord, VISummary)] =
sc.emptyRDD[(DetailInputRecord,
VISummary)] to create empty RDD before. Give it a try, it might work for
you too.
2015-07-06 14:11 GMT-07:00
The following errors are occurring upon building using mvn options clean
package
Are there some requirements/restrictions on profiles/settings for catalyst
to build properly?
[error]
/shared/sparkup2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala:138:
value
Hi Ashish,
If you are not using Spark on YARN and instead using Spark Standalone, you
don’t need Spark history server. More on the Web Interfaces is provided in the
following link. Since are using standalone mode, you should be able to access
the web UI for the master and workers at ports that
hi,
there are some useful functions in DoubleRDDFunctions, which I can use if I
have RDD[Double], eg, mean, variance.
Vector doesn't have such methods, how can I convert Vector to RDD[Double],
or maybe better if I can call mean directly on a Vector?
On UI?
Master: http://masterip:8080
Worker: http://workerIp:8081
On Wed, Jul 8, 2015 at 1:42 PM, Ashish Dutt ashish.du...@gmail.com wrote:
Hi,
I have CDH 5.4 installed on a linux server. It has 1 cluster in which
spark is deployed as a history server.
I am trying to connect my laptop to the
Hello Guru,
Many thanks for your reply.
I am new to this who thing. So pardon me for my naiivety at times.
I am not sure if I am using Spark standalone or Spark on Yarn because when
I check the port number of Spark it shows it as 18088 and like you have
mentioned maybe it is then Spark on Yarn.
Hi Srikant thanks for the response. I have the following code:
hiveContext.sql(insert into... ).coalesce(6)
Above code does not create 6 part files it creates around 200 small files.
Please guide. Thanks.
On Jul 8, 2015 4:07 AM, Srikanth srikanth...@gmail.com wrote:
Did you do
Hi, bdev
Derby is the default embedded DB for Hive MetaStore if you do not specify a
hive.metastore.uris, please take a look at the lib directory of hive, you can
find out derby jar there, Spark does not require derby by default
At 2015-07-07 17:07:28, bdev buntu...@gmail.com wrote:
Just
Each time you run the jar, a new JVM will be started, maintain connection
between different JVM is not a correct way to think of
each time when I run that jar it tries to make connection with hive metastore
At 2015-07-07 17:07:06, wazza rajeshkumarit8...@gmail.com wrote:
Hi I am new to
1 - 100 of 108 matches
Mail list logo