Hi,
When running spark in cluster mode, it give out this warning message
occasionally, will it affect the final result?
ERROR CoarseGrainedExecutorBackend: Driver 192.168.1.1:45725 disassociated!
Shutting down.
Hi all,
I use PostgreSQL to store the hive metadata.
First, I imported a sql script to metastore database as follows:
psql -U postgres -d metastore -h 192.168.50.30 -f
hive-schema-1.2.0.postgres.sql
Then, when I started $SPARK_HOME/bin/spark-sql, the PostgreSQL gave the
following
Hi Simon
Can you describe your problem in more details?
I suspect that my problem is because the window function (or may be the groupBy
agg functions).
If you are the same. May be we should report a bug
At 2016-05-11 23:46:49, "Simon Schiff [via Apache Spark User List]"
sorry I have to correction again. It may still a memory leak. Because at last
the memory usage goes up again...
eventually , the stream program crashed.
--
View this message in context:
I'm building spark from branch-1.6 source with mvn -DskipTests package and
I'm running the following code with spark shell.
*val* sqlContext *=* *new* org.apache.spark.sql.*SQLContext*(sc)
*import* *sqlContext.implicits._*
*val df = sqlContext.read.json("persons.json")*
*val df2 =
You have kept 3rd party jars at hdfs. I don't think executors as of today
can download jars from hdfs.. Can you try with a shared directory..
Application jar is downloaded by executors through http server..
-Raghav
On 12 May 2016 00:04, "Giri P" wrote:
> Yes..They are
Is the class mentioned in the exception below the parent class of the
anonymous "Function" class you're creating?
If so, you may need to make it serializable. Or make your function a
proper "standalone" class (either a nested static class or a top-level
one).
On Wed, May 11, 2016 at 3:55 PM,
I have a streaming app that receives very complicated JSON (twitter status).
I would like to work with it as a hash map. It would be very difficult to
define a pojo for this JSON. (I can not use twitter4j)
// map json string to map
JavaRDD> jsonMapRDD =
Have you seen this thread ?
http://search-hadoop.com/m/q3RTtpO0qI3cp06/JodaDateTimeSerializer+spark=Re+NPE+when+using+Joda+DateTime
On Wed, May 11, 2016 at 2:18 PM, Younes Naguib <
younes.nag...@tritondigital.com> wrote:
> Hi all,
>
> I'm trying to get to use spark.serializer.
> I set it in the
Hi all,
I'm trying to get to use spark.serializer.
I set it in the spark-default.conf, but I statred getting issues with datetimes.
As I understand, I need to disable it.
Anyways to keep using kryo?
It's seems I can use JodaDateTimeSerializer for datetimes, just not sure how to
set it, and
Hi Pawel,
I'd like to hear more about your idea. Could you explain more why you would
like to have a gitter channel? What are the advantages over a mailing list
(like this one)? Have you had good experiences using gitter on other open
source projects?
Xinh
On Wed, May 11, 2016 at 11:10 AM, Sean
run JPS like below
jps
19724 SparkSubmit
10612 Worker
and do ps awx|grep PID
for each number that represents these two descriptions. something like
ps awx|grep 30208
30208 pts/2Sl+1:05 /usr/java/latest/bin/java -cp
yes, i m running this as standalone mode.
On Wed, May 11, 2016 at 6:23 PM, Mich Talebzadeh
wrote:
> are you running this in standalone mode? that is one physical host, and
> the executor will live inside the driver.
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn *
>
Hi,
I am hitting OutOfMemoryError issues with spark executors. It happens
mainly during shuffle. Executors gets killed with OutOfMemoryError. I have
try setting up spark.executor.extraJavaOptions to take memory dump but its
not happening.
spark.executor.extraJavaOptions = "-XX:+UseCompressedOops
Hi -
I have a very unique problem which I am trying to solve and I am not sure
if spark would help here.
I have a directory: /X/Y/a.txt and in the same structure /X/Y/Z/b.txt.
a.txt contains a unique serial number, say:
12345
and b.txt contains key value pairs.
a,1
b,1,
c,0 etc.
Everyday you
Yes..They are reachable. Application jar which I send as argument is at
same location as third party jar. Application jar is getting uploaded.
On Wed, May 11, 2016 at 10:51 AM, lalit sharma
wrote:
> Point to note as per docs as well :
>
> *Note that jars or python
>
>
> logical plan after optimizer execution:
>
> Project [id#0L,id#1L]
> !+- Filter (id#0L = cast(1 as bigint))
> ! +- Join Inner, Some((id#0L = id#1L))
> ! :- Subquery t
> ! : +- Relation[id#0L] JSONRelation
> ! +- Subquery u
> ! +- Relation[id#1L] JSONRelation
>
Some how missed that ;)
Anything about Datasets slowness ?
On Wed, May 11, 2016, 21:02 Ted Yu wrote:
> Which release are you using ?
>
> You can use the following to disable UI:
> --conf spark.ui.enabled=false
>
> On Wed, May 11, 2016 at 10:59 AM, Amit Sela
I don't know of a gitter channel and I don't use it myself, FWIW. I
think anyone's welcome to start one.
I hesitate to recommend this, simply because it's preferable to have
one place for discussion rather than split it over several, and, we
have to keep the @spark.apache.org mailing lists as the
Which release are you using ?
You can use the following to disable UI:
--conf spark.ui.enabled=false
On Wed, May 11, 2016 at 10:59 AM, Amit Sela wrote:
> I've ran a simple WordCount example with a very small List as
> input lines and ran it in standalone (local[*]), and
I've ran a simple WordCount example with a very small List as input
lines and ran it in standalone (local[*]), and Datasets is very slow..
We're talking ~700 msec for RDDs while Datasets takes ~3.5 sec.
Is this just start-up overhead ? please note that I'm not timing the
context creation...
And
Point to note as per docs as well :
*Note that jars or python files that are passed to spark-submit should be
URIs reachable by Mesos slaves, as the Spark driver doesn’t automatically
upload local jars.**http://spark.apache.org/docs/latest/running-on-mesos.html
no answer, but maybe one more time, a gitter channel for spark users would
be a good idea!
On Mon, May 9, 2016 at 1:45 PM, Paweł Szulc wrote:
> Hi,
>
> I was wondering - why Spark does not have a gitter channel?
>
> --
> Regards,
> Paul Szulc
>
> twitter: @rabbitonweb
>
Please note:
The name of hbase table is specified in:
def writeCatalog = s"""{
|"table":{"namespace":"default", "name":"table1"},
not by the:
HBaseTableCatalog.newTable -> "5"
FYI
On Tue, May 10, 2016 at 3:11 PM, Ted Yu wrote:
> I think so.
>
>
This may be related to: https://issues.apache.org/jira/browse/SPARK-13773
Regards,
James
On 11 May 2016 at 15:49, Ted Yu wrote:
> In master branch, behavior is the same.
>
> Suggest opening a JIRA if you haven't done so.
>
> On Wed, May 11, 2016 at 6:55 AM, Tony Jin
I'm not using docker
On Wed, May 11, 2016 at 8:47 AM, Raghavendra Pandey <
raghavendra.pan...@gmail.com> wrote:
> By any chance, are you using docker to execute?
> On 11 May 2016 21:16, "Raghavendra Pandey"
> wrote:
>
>> On 11 May 2016 02:13, "gpatcham"
are you running this in standalone mode? that is one physical host, and
the executor will live inside the driver.
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
I am running Spark on DSE Cassandra with multiple analytics data centers.
It is my understanding that with this setup you should have a CFS file
system for each data center. I was able to create an additional CFS file
system as described here:
By any chance, are you using docker to execute?
On 11 May 2016 21:16, "Raghavendra Pandey"
wrote:
> On 11 May 2016 02:13, "gpatcham" wrote:
>
> >
>
> > Hi All,
> >
> > I'm using --jars option in spark-submit to send 3rd party jars . But I
>
yes,
On Wed, May 11, 2016 at 5:43 PM, Deepak Sharma
wrote:
> Since you are registering workers from the same node , do you have enough
> cores and RAM(In this case >=9 cores and > = 24 GB ) on this
> node(11.14.224.24)?
>
> Thanks
> Deepak
>
> On Wed, May 11, 2016 at 9:08
On 11 May 2016 02:13, "gpatcham" wrote:
>
> Hi All,
>
> I'm using --jars option in spark-submit to send 3rd party jars . But I
don't
> see they are actually passed to mesos slaves. Getting Noclass found
> exceptions.
>
> This is how I'm using --jars option
>
> --jars
Since you are registering workers from the same node , do you have enough
cores and RAM(In this case >=9 cores and > = 24 GB ) on this
node(11.14.224.24)?
Thanks
Deepak
On Wed, May 11, 2016 at 9:08 PM, شجاع الرحمن بیگ
wrote:
> Hi All,
>
> I need to set same memory and
Hi All,
I need to set same memory and core for each worker on same machine and for
this purpose, I have set the following properties in conf/spark-env.sh
export SPARK_EXECUTOR_INSTANCE=3
export SPARK_WORKER_CORES=3
export SPARK_WORKER_MEMORY=8g
but only one worker is getting desired memory and
You can create a column with count of /. Then take max of it and create
that many columns for every row with null fillers.
Raghav
On 11 May 2016 20:37, "Bharathi Raja" wrote:
Hi,
I have a dataframe column col1 with values something like
Hi,
I have a dataframe column col1 with values something like
“/client/service/version/method”. The number of “/” are not constant.
Could you please help me to extract all methods from the column col1?
In Pig i used SUBSTRING with LAST_INDEX_OF(“/”).
Thanks in advance.
Regards,
Raja
In master branch, behavior is the same.
Suggest opening a JIRA if you haven't done so.
On Wed, May 11, 2016 at 6:55 AM, Tony Jin wrote:
> Hi guys,
>
> I have a problem about spark DataFrame. My spark version is 1.6.1.
> Basically, i used udf and df.withColumn to create a
Hi guys,
I have a problem about spark DataFrame. My spark version is 1.6.1.
Basically, i used udf and df.withColumn to create a "new" column, and then
i filter the values on this new columns and call show(action). I see the
udf function (which is used to by withColumn to create the new column) is
Will try with JSON relation, but with Spark's temp tables (Spark version
1.6 ) I get an optimized plan as you have mentioned. Should not be much
different though.
Query : "select t1.col2, t1.col3 from t1, t2 where t1.col1=t2.col1 and
t1.col3=7"
Plan :
Project [COL2#1,COL3#2]
+- Join Inner,
Looks like the exception was thrown from this line:
ByteBuffer.wrap(taskBinary.value),
Thread.currentThread.getContextClassLoader)
Comment for taskBinary says:
* @param taskBinary broadcasted version of the serialized RDD and the
function to apply on each
* partition
I use Sparkling Water 1.6.3, Spark 1.6.I use Java Oracle 8 or
OpenJDK-7:(every time I get this error when I transform Spark DataFrame into
H2O DataFrame. Spark cluster dies..):ERROR:py4j.java_gateway:Error while
sending or receiving.Traceback (most recent call last): File
In this case, isn't better to perform the filter earlier as possible even there
could be unhandled predicates?
Telmo Rodrigues
No dia 11/05/2016, às 09:49, Rishi Mishra escreveu:
> It does push the predicate. But as a relations are generic and might or might
> not
Hi,
I'm running a very simple job (textFile->map->groupby->count) and hitting
this with Spark 1.6.0 on EMR 4.3 (Hadoop 2.7.1) and hitting this exception
when running on yarn-client and not in local mode:
16/05/11 10:29:26 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID
1,
After 8 hours. The usage of memory become stable. Use the Top command will
find it will be 75%. So means 12GB memory.
But it still do not make sense. Because my workload is very small.
I use this spark to calculate on one csv file every 20 seconds. The size of
the csv file is 1.3M.
So spark
You can use Joins as a substitute to subqueries.
On Wed, May 11, 2016 at 1:27 PM, Divya Gehlot
wrote:
> Hi,
> I am using Spark 1.5.2 with Apache Phoenix 4.4
> As Spark 1.5.2 doesn't support subquery in where conditions .
>
It does push the predicate. But as a relations are generic and might or
might not handle some of the predicates , it needs to apply filter of
un-handled predicates.
Regards,
Rishitesh Mishra,
SnappyData . (http://www.snappydata.io/)
https://in.linkedin.com/in/rishiteshmishra
On Wed, May 11,
I think that Impala and Hive have this feature. Impala is faster than hive,
hive is probably better to use in batch mode.
Alonso Isidoro Roman
[image: https://]about.me/alonso.isidoro.roman
ok, thanks anyway
On Wed, May 11, 2016 at 12:15 AM, joyceye04 [via Apache Spark User List] <
ml-node+s1001560n26919...@n3.nabble.com> wrote:
> Not yet. And I turned to another way to bypass it just to finish my work.
> Still waiting for answers :(
>
> --
> If you
Hi All,
I'm newbie in spark mlib. In my office I have a statistician who work on
improving our matrix model for our recommendation engine. However he works
on R. He told me that it's quite possible to combine the collaborative
filtering and latent dirichlet allocation (LDA) by doing some
Hi,
I am using Spark 1.5.2 with Apache Phoenix 4.4
As Spark 1.5.2 doesn't support subquery in where conditions .
https://issues.apache.org/jira/browse/SPARK-4226
Is there any alternative way to find foreign key constraints.
Would really appreciate the help.
Thanks,
Divya
Ok you can see that the process 10603 Worker is running as the worker/slave
in your drive manager connection to GUI port webui-port 8081
spark://ES01:7077. That you can access through web
Also you have process 12420 running as SparkSubmit. that is telling you the
JVM you have submitted for this
I am trying to write incoming stream data to database. Following is the
example program, this code creates a thread to listen to incoming stream of
data which is csv data. this data needs to be split with delimiter and the
array of data needs to be pushed to database as separate columns in the
Hello,
I am new to spark and I am currently learning how to use classification
algorithm with it.
For now on I am playing with a rather small dataset and training a decision
tree on my laptop (running with --master local[1]).
However, systematically I see that my jobs are hanging forever at the
52 matches
Mail list logo