Hey, so I start the context at the very end when all the piping is done.
BTW a foreachRDD will be called on the resulting dstream.map() right after
that.
The puzzling thing is why removing the context bounds solve the problem...
What does this exception mean in general?
On Mon, Apr 20, 2015 at
Hello,
I have the above naive question if anyone could help. Why not using a
Row-based File format to save Row-based DataFrames/RDD?
Thanks,
Lan
Did you keep adding new files under the `train/` folder? What was the
exact warn message? -Xiangrui
On Fri, Apr 17, 2015 at 4:56 AM, barisak baris.akg...@gmail.com wrote:
Hi,
I write this code for just train the Stream Linear Regression, but I took no
data found warn, so no weights were not
Hi Spark community,
I'm very new to the Apache Spark community; but if this (very active) group
is anything like the other Apache project user groups I've worked with, I'm
going to enjoy discussions here very much. Thanks in advance!
Use Case:
I am trying to go from flat files of user response
Hi Marcelo,
Exactly what I need to track, thanks for the JIRA pointer.
Date: Mon, 20 Apr 2015 14:03:55 -0700
Subject: Re: GSSException when submitting Spark job in yarn-cluster mode with
HiveContext APIs on Kerberos cluster
From: van...@cloudera.com
To: alee...@hotmail.com
CC:
Does anybody have an idea? a clue? a hint?
Thanks!
Renato M.
2015-04-20 9:31 GMT+02:00 Renato Marroquín Mogrovejo
renatoj.marroq...@gmail.com:
Hi all,
I have a simple query Select * from tableX where attribute1 between 0 and
5 that I run over a Kryo file with four partitions that ends up
SparkSQL optimizes better by column pruning and predicate pushdown,
primarily. Here you are not taking advantage of either.
I am curious to know what goes in your filter function, as you are not
using a filter in SQL side.
Best
Ayan
On 21 Apr 2015 08:05, Renato Marroquín Mogrovejo
You can find the user guide for vector creation here:
http://spark.apache.org/docs/latest/mllib-data-types.html#local-vector.
-Xiangrui
On Mon, Apr 20, 2015 at 2:32 PM, Dan DeCapria, CivicScience
dan.decap...@civicscience.com wrote:
Hi Spark community,
I'm very new to the Apache Spark
Responses inline.
On Mon, Apr 20, 2015 at 3:27 PM, Ankit Patel patel7...@hotmail.com wrote:
What you said is correct and I am expecting the printlns to be in my
console or my SparkUI. I do not see it in either places.
Can you actually login into the machine running the executor which runs the
Is HiveContext still preferred over SQLContext?
What are the current (1.3.1) diferences between them?
thanks
Daniel
Could you attach the full stack trace? Please also include the stack
trace from executors, which you can find on the Spark WebUI. -Xiangrui
On Thu, Apr 16, 2015 at 1:00 PM, riginos samarasrigi...@gmail.com wrote:
I have a big dataset of categories of cars and descriptions of cars. So i
want to
Greetings,
We have an analytics workflow system in production. This system is built in
Java and utilizes other services (including Apache Solr). It works fine
with moderate level of data/processing load. However, when the load goes
beyond certain limit (e.g., more than 10 million
Please take a look at https://issues.apache.org/jira/browse/PHOENIX-1815
On Mon, Apr 20, 2015 at 10:11 AM, Jeetendra Gangele gangele...@gmail.com
wrote:
Thanks for reply.
Does phoenix using inside Spark will be useful?
what is the best way to bring data from Hbase into Spark in terms
I think recommended use will be creating a dataframe using hbase as source.
Then you can run any SQL on that DF.
In 1.2 you can create a base rdd and then apply schema in the same manner
On 21 Apr 2015 03:12, Jeetendra Gangele gangele...@gmail.com wrote:
Thanks for reply.
Does phoenix using
There is a cost to converting from JavaBeans to Rows and this code path has
not been optimized. That is likely what you are seeing.
On Mon, Apr 20, 2015 at 3:55 PM, ayan guha guha.a...@gmail.com wrote:
SparkSQL optimizes better by column pruning and predicate pushdown,
primarily. Here you are
You are probably using an encoding that we don't support. I think this PR
may be adding that support: https://github.com/apache/spark/pull/5422
On Sat, Apr 18, 2015 at 5:40 PM, Abhishek R. Singh
abhis...@tetrationanalytics.com wrote:
I have created a bunch of protobuf based parquet files that
You can always create another DF using a map. In reality operations are
lazy so only final value will get computed.
Can you provide the usecase in little more detail?
On 21 Apr 2015 08:39, ARose ashley.r...@telarix.com wrote:
In my Java application, I want to update the values of a Column in a
Hello,
I have the above naive question if anyone could help. Why not using a
Row-based File format to save Row-based DataFrames/RDD?
Thanks,
Lan
--
View this message in context:
Thank you :)
On Mon, Apr 20, 2015 at 4:46 PM, Jörn Franke jornfra...@gmail.com wrote:
Hi, If you have just one physical machine then I would try out Docker
instead of a full VM (would be waste of memory and CPU).
Best regards
Le 20 avr. 2015 00:11, hnahak harihar1...@gmail.com a écrit :
What you said is correct and I am expecting the printlns to be in my console or
my SparkUI. I do not see it in either places. However, if you run the program
then the printlns do print for the constructor of the receiver and the for the
foreach statements with total count 0. When you run it in
In my Java application, I want to update the values of a Column in a given
DataFrame. However, I realize DataFrames are immutable, and therefore cannot
be updated by conventional means. Is there a workaround for this sort of
transformation? If so, can someone provide an example?
--
View this
You should check where MyDenseVectorUDT is defined and whether it was
on the classpath (or in the assembly jar) at runtime. Make sure the
full class name (with package name) is used. Btw, UDTs are not public
yet, so please use it with caution. -Xiangrui
On Fri, Apr 17, 2015 at 12:45 AM, Jaonary
I got exactly the same problem, except that I'm running on a standalone
master. Can you tell me the counterpart parameter on standalone master for
increasing the same memroy overhead?
--
View this message in context:
I keep seeing these warnings when using trainImplicit:
WARN TaskSetManager: Stage 246 contains a task of very large size (208 KB).
The maximum recommended task size is 100 KB.
And then the task size starts to increase. Is this a known issue ?
Thanks !
--
Blog http://blog.christianperone.com |
Hi,
I am studying the RDD Caching function and write a small program to verify it.
I run the program in a Spark1.3.0 environment and on Yarn cluster. But I meet a
weird exception. It isn't always generated in the log. Only sometimes I can see
this exception. And it does not affect the output
It also is a little more evidence to Jonathan's suggestion that there is a
null / 0 record that is getting grouped together.
To fix this, do i need to run a filter ?
val viEventsRaw = details.map { vi = (vi.get(14).asInstanceOf[Long],
vi) }
val viEvents = viEventsRaw.filter { case
In my understanding you need to create a key out of the data and
repartition both datasets to achieve map side join.
On 21 Apr 2015 14:10, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote:
Can someone share their working code of Map Side join in Spark + Scala.
(No Spark-SQL)
The only resource i could
Hi all,
Is there anything to integrate spark with accumulo or make spark to
process over accumulo data?
Thanks
Madhvi Gupta
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail:
I have built a data analytics SaaS platform by creating Rest end points and
based on the type of job request I would invoke the necessary spark job/jobs
and return the results as json(async). I used yarn-client mode to submit the
jobs to yarn cluster.
hope this helps.
--
View this
After the above changes
1) filter shows this
Tasks IndexIDAttemptStatusLocality LevelExecutor ID / HostLaunch Time
DurationGC TimeInput Size / RecordsWrite TimeShuffle Write Size / Records
Errors 0 1 0 SUCCESS ANY 1 / phxaishdc9dn1571.stratus.phx.ebay.com 2015/04/20
20:55:31 7.4 min 21 s 129.7
Can someone share their working code of Map Side join in Spark + Scala. (No
Spark-SQL)
The only resource i could find was this (Open in chrome with Chinese to
english translator)
http://dongxicheng.org/framework-on-yarn/apache-spark-join-two-tables/
--
Deepak
Could you do it using flatMap?
Punya
On Tue, Apr 21, 2015 at 12:19 AM ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote:
The reason am asking this is, i am not able to understand how do i do a
skip.
1) Broadcast small table-1 as map.
2) I jun do .map() on large table-2.
When you do .map()
I did this
val lstgItemMap = listings.map { lstg = (lstg.getItemId().toLong, lstg)
}.collectAsMap
val broadCastMap = sc.broadcast(lstgItemMap)
val viEventsWithListings: RDD[(Long, (DetailInputRecord, VISummary,
Long))] = viEvents.mapPartitions({
iter =
val lstgItemMap =
You can use rdd.unpersist. its documented in spark programming guide page
under Removing Data section.
Ayan
On 21 Apr 2015 13:16, Wei Wei vivie...@gmail.com wrote:
Hey folks,
I am trying to load a directory of avro files like this in spark-shell:
val data =
Hey folks,
I am trying to load a directory of avro files like this in spark-shell:
val data = sqlContext.avroFile(hdfs://path/to/dir/*).cache
data.count
This works fine, but when more files are uploaded to that directory
running these two lines again yields the same result. I suspect there
is
The reason am asking this is, i am not able to understand how do i do a
skip.
1) Broadcast small table-1 as map.
2) I jun do .map() on large table-2.
When you do .map() you must map each element to a new element.
However with map-side join, when i get the broadcasted map, i will search
in
What is re-partition ?
On Tue, Apr 21, 2015 at 10:23 AM, ayan guha guha.a...@gmail.com wrote:
In my understanding you need to create a key out of the data and
repartition both datasets to achieve map side join.
On 21 Apr 2015 14:10, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote:
Can someone share
You have a window operation; I have seen that behaviour before with window
operations in spark streaming. My solution was to move away from window
operations using probabilistic data structures; it might not be an option
for you.
2015-04-20 10:29 GMT+01:00 Marius Soutier mps@gmail.com:
The
What machines are HDFS data nodes -- just your master? that would
explain it. Otherwise, is it actually the write that's slow or is
something else you're doing much faster on the master for other
reasons maybe? like you're actually shipping data via the master first
in some local computation? so
Hi all,
On https://spark.apache.org/docs/latest/programming-guide.html
under the RDD Persistence Removing Data, it states
Spark automatically monitors cache usage on each node and drops out old
data partitions in a least-recently-used (LRU) fashion.
Can it be understood that the cache will
Not sure what would slow it down as the repartition completes equally fast
on all nodes, implying that the data is available on all, then there are a
few computation steps none of them local on the master.
On Mon, Apr 20, 2015 at 12:57 PM, Sean Owen so...@cloudera.com wrote:
What machines are
Check whether your partitioning results in balanced partitions ie partitions
with similar sizes - one of the reasons for the performance differences
observed by you may be that after your explicit repartitioning, the partition
on your master node is much smaller than the RDD partitions on the
Hi
Just upgraded to Spark 1.3.1.
I am getting an warning
Warning (from warnings module):
File
D:\spark\spark-1.3.1-bin-hadoop2.6\spark-1.3.1-bin-hadoop2.6\spark-1.3.1-bin-hadoop2.6\python\pyspark\sql\context.py,
line 191
warnings.warn(inferSchema is deprecated, please use createDataFrame
I think the memory requested by your job 2.0 GB is higher than what is
requested.
Please request for 256 MB explicitly which creating Spark Context and try
again.
Thanks and Regards,
Suraj Sheth
On Mon, Apr 20, 2015 at 2:44 PM, madhvi madhvi.gu...@orkash.com wrote:
PFA screenshot of my
There are lot of similar problems shared and resolved by users on this same
portal. I have been part of those discussions before, Search those, Please
Try them and let us know, if you still face problems.
Thanks and Regards,
Archit Thakur.
On Mon, Apr 20, 2015 at 3:05 PM, madhvi
The same, which is between map and foreach. map takes iterator returns
iterator foreach takes iterator returns Unit.
On Mon, Apr 20, 2015 at 4:05 PM, Arun Patel arunp.bigd...@gmail.com wrote:
What is difference between mapPartitions vs foreachPartition?
When to use these?
Thanks,
Arun
On Monday 20 April 2015 03:18 PM, Archit Thakur wrote:
There are lot of similar problems shared and resolved by users on this
same portal. I have been part of those discussions before, Search
those, Please Try them and let us know, if you still face problems.
Thanks and Regards,
Archit
The code I've written is simple as it just invokes a thread and calls a store
method on the Receiver class.
I see this code with printlns working fine when I try spark-submit --jars
jar --class test.TestCustomReceiver jar
However it does not work with I try the same command above with --master
Thanks, it helped.
We can't use Spark 1.3 because Cassandra DSE doesn't support it.
2015-04-17 21:48 GMT+02:00 Imran Rashid iras...@cloudera.com:
are you calling sc.stop() at the end of your applications?
The history server only displays completed applications, but if you don't
call
True.
On Mon, Apr 20, 2015 at 4:14 PM, Arun Patel arunp.bigd...@gmail.com wrote:
mapPartitions is a transformation and foreachPartition is a an action?
Thanks
Arun
On Mon, Apr 20, 2015 at 4:38 AM, Archit Thakur archit279tha...@gmail.com
wrote:
The same, which is between map and foreach.
Hi all,
I have a simple query Select * from tableX where attribute1 between 0 and
5 that I run over a Kryo file with four partitions that ends up being
around 3.5 million rows in our case.
If I run this query by doing a simple map().filter() it takes around ~9.6
seconds but when I apply schema,
Hi all,
I am running the Python process that communicates with Spark in a
virtualenv. Is there any way I can make sure that the Python processes
of the workers are also started in a virtualenv? Currently I am getting
ImportErrors when the worker tries to unpickle stuff that is not
installed
Are you seeing your task being submitted to the UI? Under completed or
running tasks? How much resources are you allocating for your job? Can you
share a screenshot of your cluster UI and the code snippet that you are
trying to run?
Thanks
Best Regards
On Mon, Apr 20, 2015 at 12:37 PM, madhvi
try doing a sc.addJar(path\to\your\postgres\jar)
Thanks
Best Regards
On Mon, Apr 20, 2015 at 12:26 PM, shashanksoni shashankso...@gmail.com
wrote:
I am using spark 1.3 standalone cluster on my local windows and trying to
load data from one of our server. Below is my code -
import os
Hi all,
I have a three node cluster with identical hardware. I am trying a workflow
where it reads data from hdfs, repartitions it and runs a few map operations
then writes the results back to hdfs.
It looks like that all the computation, including the repartitioning and the
maps complete within
On Monday 20 April 2015 02:52 PM, SURAJ SHETH wrote:
Hi Madhvi,
I think the memory requested by your job, i.e. 2.0 GB is higher than
what is available.
Please request for 256 MB explicitly while creating Spark Context and
try again.
Thanks and Regards,
Suraj Sheth
Tried the same but still
I looking for that build too.
-Conor
On Mon, Apr 20, 2015 at 9:18 AM, Marius Soutier mps@gmail.com wrote:
Same problem here...
On 20.04.2015, at 09:59, Zsolt Tóth toth.zsolt@gmail.com wrote:
Hi all,
it looks like the 1.2.2 pre-built version for hadoop2.4 is not available
on
Hi all,
it looks like the 1.2.2 pre-built version for hadoop2.4 is not available on
the mirror sites. Am I missing something?
Regards,
Zsolt
Any pointers to this issue..?
Thanks Regards
Brahma Reddy Battula
From: Brahma Reddy Battula [brahmareddy.batt...@huawei.com]
Sent: Monday, April 20, 2015 9:30 AM
To: Sean Owen; Ted Yu
Cc: user@spark.apache.org
Subject: RE: compliation error
Thanks
Same problem here...
On 20.04.2015, at 09:59, Zsolt Tóth toth.zsolt@gmail.com wrote:
Hi all,
it looks like the 1.2.2 pre-built version for hadoop2.4 is not available on
the mirror sites. Am I missing something?
Regards,
Zsolt
Can you try sc.addJar(/path/to/your/hive/jar), i think it will resolve it.
Thanks
Best Regards
On Mon, Apr 20, 2015 at 12:26 PM, Manku Timma manku.tim...@gmail.com
wrote:
Akhil,
But the first case of creating HiveConf on the executor works fine (map
case). Only the second case fails. I was
Hi,
I Did the same you told but now it is giving the following error:
ERROR TaskSchedulerImpl: Exiting due to error from cluster scheduler:
All masters are unresponsive! Giving up.
On UI it is showing that master is working
Thanks
Madhvi
On Monday 20 April 2015 12:28 PM, Akhil Das wrote:
In
I don’t think this problem is related to Netty or NIO, switching to nio will
not change this part of code path to get the index file for sort-based shuffle
reader.
I think you could check your system from some aspects:
1. Is there any hardware problem like disk full or others which makes this
Hi Twinkle,
We have a use case in where we want to debug the reason of how n why an
executor got killed.
Could be because of stackoverflow, GC or any other unexpected scenario.
If I see the driver UI there is no information present around killed
executors, So was just curious how do people
The processing speed displayed in the UI doesn’t seem to take everything into
account. I also had a low processing time but had to increase batch duration
from 30 seconds to 1 minute because waiting batches kept increasing. Now it
runs fine.
On 17.04.2015, at 13:30, González Salgado, Miquel
Hi Archit,
What is your use case and what kind of metrics are you planning to add?
Thanks,
Twinkle
On Fri, Apr 17, 2015 at 4:07 PM, Archit Thakur archit279tha...@gmail.com
wrote:
Hi,
We are planning to add new Metrics in Spark for the executors that got
killed during the execution. Was
No I am not getting any task on the UI which I am running.Also I have
set instances=1 but on UI it is showing 2 workers.i am running the java
word count code exactly but i have the text file in HDFS.Following is
the part of my code I writing to make connection
SparkConf sparkConf = new
What is difference between mapPartitions vs foreachPartition?
When to use these?
Thanks,
Arun
Hi Madhvi,
I think the memory requested by your job, i.e. 2.0 GB is higher than what
is available.
Please request for 256 MB explicitly while creating Spark Context and try
again.
Thanks and Regards,
Suraj Sheth
Hello!
I'm newbie in spark I would like to understand some basic mechanism on how
it works behind the scenes.
I have attached the lineage of my RDD and I have the following questions:
1. Why do I have 8 stages instead of 5? From the book Learning from Spark
(Chapter 8 -http://bit.ly/1E0Hah7), I
mapPartitions is a transformation and foreachPartition is a an action?
Thanks
Arun
On Mon, Apr 20, 2015 at 4:38 AM, Archit Thakur archit279tha...@gmail.com
wrote:
The same, which is between map and foreach. map takes iterator returns
iterator foreach takes iterator returns Unit.
On Mon, Apr
Hi,
I aim to do custom partitioning on a text file. I first convert it into
pairRDD and then try to use my custom partitioner. However, somehow it is
not working. My code snippet is given below.
val file=sc.textFile(filePath)
val locLines=file.map(line = line.split(\t)).map(line=
Hi.
My usage is only about the spark core and hdfs, so no spark sql or
mlib or other components invovled.
I saw the hint on the
http://spark.apache.org/docs/latest/building-spark.html, with a sample
like:
build/sbt -Pyarn -Phadoop-2.3 assembly. (what's the -P for?)
Fundamentally, I'd like to
I implemented two kinds of DataSource, one load data during buildScan, the
other returning my RDD class with partition information for future loading.
My RDD's compute gets actorSystem from SparkEnv.get.actorSystem, then use
Spray to interact with a HTTP endpoint, which is the same flow as
I also see that its creating both receivers on the same executor and that might
be the cause of having more RDDs on executor than the other. Can I suggest
spark to create each receiver on a each executor
Regards,Laeeq
On Monday, April 20, 2015 4:51 PM, Evo Eftimov evo.efti...@isecc.com
HI All,
I am Querying Hbase and combining result and using in my spake job.
I am querying hbase using Hbase client api inside my spark job.
can anybody suggest me will Spark SQl will be fast enough and provide range
of queries?
Regards
Jeetendra
What is meant by “streams” here:
1. Two different DSTream Receivers producing two different DSTreams
consuming from two different kafka topics, each with different message rate
2. One kafka topic (hence only one message rate to consider) but with two
different DStream receivers
Hi,
Today I upgraded our code and cluster to 1.3.
We are using Spark 1.3 in Amazon EMR, ami 3.6, include history server and
Ganglia.
I also migrated all deprecated SchemaRDD into DataFrame.
Now when I'm trying to read a parquet files from s3 I get the below
exception.
Actually it not a problem if
They both have same message rates, 300 record/sec
On Monday, April 20, 2015 4:51 PM, Evo Eftimov evo.efti...@isecc.com
wrote:
#yiv8130515999 #yiv8130515999 -- _filtered #yiv8130515999
{font-family:Helvetica;panose-1:2 11 6 4 2 2 2 2 2 4;} _filtered #yiv8130515999
{panose-1:2 4 5 3
Interesting:
Remove the history server, '-a' option and using ami 3.5 fixed the problem.
Now the question is: what made the change?...
I vote for the '-a' but let me update...
On Mon, Apr 20, 2015 at 5:43 PM, Ophir Cohen oph...@gmail.com wrote:
Hi,
Today I upgraded our code and cluster to 1.3.
Rename your log4j_special.properties file as log4j.properties and place it
under the root of your jar file, you should be fine.
If you are using Maven to build your jar, please the log4j.properties in the
src/main/resources folder.
However, please note that if you have other dependency jar
Hi all,
I need to configure spark executor log4j.properties on a standalone cluster.
It looks like placing the relevant properties file in the spark
configuration folder and setting the spark.executor.extraJavaOptions from
my application code:
sparkConf.set(spark.executor.extraJavaOptions,
Hi,
I have two different topics and two Kafka receivers, one for each topic.
Regards,Laeeq
On Monday, April 20, 2015 4:28 PM, Evo Eftimov evo.efti...@isecc.com
wrote:
#yiv4992037734 #yiv4992037734 -- _filtered #yiv4992037734 {panose-1:2 4 5 3 5
4 6 3 2 4;} _filtered #yiv4992037734
Hi,
I have two streams of data from kafka. How can I make approx. equal number of
RDD blocks of on two executors.Please see the attachement, one worker has 1785
RDD blocks and the other has 26.
Regards,Laeeq
-
To unsubscribe,
And what is the message rate of each topic mate – that was the other part of
the required clarifications
From: Laeeq Ahmed [mailto:laeeqsp...@yahoo.com]
Sent: Monday, April 20, 2015 3:38 PM
To: Evo Eftimov; user@spark.apache.org
Subject: Re: Equal number of RDD Blocks
Hi,
I have two
Well spark steraming is supposed to create / distribute the Receivers on
different cluster nodes. If you are saying that actually your receivers are
running on the same node probably that node is getting most of the data to
minimize the network transfer costs
If you want to distribute your
Hi all,
I had posed this query as part of a different thread but did not get a
response there. So creating a new thread hoping to catch someone's
attention.
We are experiencing this issue of shuffle files being left behind and not
being cleaned up by Spark. Since this is a Spark streaming
Hi,
I am getting this serialization exception and I am not too sure what Graph
is unexpectedly null when DStream is being serialized means?
15/04/20 06:12:38 INFO yarn.ApplicationMaster: Final app status: FAILED,
exitCode: 15, (reason: User class threw exception: Task not serializable)
Exception
Write a crone job for this like below
12 * * * * find $SPARK_HOME/work -cmin +1440 -prune -exec rm -rf {} \+
32 * * * * find /tmp -type d -cmin +1440 -name spark-*-*-* -prune -exec
rm -rf {} \+
52 * * * * find $SPARK_LOCAL_DIR -mindepth 1 -maxdepth 1 -type d -cmin
+1440 -name spark-*-*-*
This appears to be a problem with SSL.
I'm facing the same issue.. Did you get around this somehow?
I'm running IBM Java 8, on linux ppc64le.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Did-anybody-run-Spark-perf-on-powerpc-tp22329p22575.html
Sent
Hi Lan,
Thanks for fast response. It could be a solution if it works. I have more
than one log4 properties file, for different run modes like
debug/production, for executor and for application core. I think I would
like to keep them separate. Then, I suppose I should give all other
properties
Each application gets its own executor processes, so there should be no
problem running them in parallel.
Lan
On Apr 20, 2015, at 10:25 AM, Michael Ryabtsev michael...@gmail.com wrote:
Hi Lan,
Thanks for fast response. It could be a solution if it works. I have more
than one log4
Hi All,
I had a question about Spark thriftserver .
I want to load some table when the Spark server started .
How can i config the external initialization in spark , i guess the spark
should had a interface can config in the spark-default.conf , and we can
implements the initialization function
This appears to be a problem with SSL.
I'm facing the same issue.. Did you get around this somehow?
I'm running IBM Java 8, on linux ppc64le.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Did-anybody-run-Spark-perf-on-powerpc-tp22329p22576.html
Sent
And the winner is: ami 3.6.
Apparently it does not work with it...
ami 3.5 works great.
Interesting:
Remove the history server, '-a' option and using ami 3.5 fixed the problem.
Now the question is: what made the change?...
I vote for the '-a' but let me update...
On Mon, Apr 20, 2015 at 5:43 PM,
Thanks for reply.
Does phoenix using inside Spark will be useful?
what is the best way to bring data from Hbase into Spark in terms
performance of application?
Regards
Jeetendra
On 20 April 2015 at 20:49, Ted Yu yuzhih...@gmail.com wrote:
To my knowledge, Spark SQL currently doesn't provide
To my knowledge, Spark SQL currently doesn't provide range scan capability
against hbase.
Cheers
On Apr 20, 2015, at 7:54 AM, Jeetendra Gangele gangele...@gmail.com wrote:
HI All,
I am Querying Hbase and combining result and using in my spake job.
I am querying hbase using Hbase
So I am trying to pull data from an external database using JDBC
MapString, String options = new HashMap();
options.put(driver, driver);
options.put(url, dburl);
options.put(dbtable, tmpTrunk);
DataFrame tbTrunkInfo = sqlContext.load(jdbc, options);
And
Now this is very important:
“Normal RDDs” refers to “batch RDDs”. However the default in-memory
Serialization of RDDs which are part of DSTream is “Srialized” rather than
actual (hydrated) Objects. The Spark documentation states that “Serialization”
is required for space and garbage
Is the only way to implement a custom partitioning of DStream via the foreach
approach so to gain access to the actual RDDs comprising the DSTReam and
hence their paritionBy method
DSTReam has only a repartition method accepting only the number of
partitions, BUT not the method of partitioning
1 - 100 of 114 matches
Mail list logo