Hello,
Has anybody tried to union two streams of Twitter Statues? I am
instantiating two twitter streams through two different set of credentials
and passing them through a union function, but the console does not show
any activity neither there are any errors.
--static function that returns
; Richard
>
>
>
>
>
> *From: *Imran Rajjad <raj...@gmail.com>
> *Date: *Wednesday, December 6, 2017 at 2:45 AM
> *To: *"user @spark" <user@spark.apache.org>
> *Subject: *unable to connect to connect to cluster 2.2.0
>
>
>
> Hi,
>
>
>
Hi,
Recently upgraded from 2.1.1 to 2.2.0. My Streaming job seems to have
broken. The submitted application is unable to connect to the cluster, when
all is running.
below is my stack trace
Spark Master:spark://192.168.10.207:7077
Job Arguments:
-appName orange_watch -directory
Greetings,
I am running a unit test designed to stream a folder where I am manually
copying csv files. The files do not always get picked up. They only get
detected when the job starts with the files already in the folder.
I even tried using the option of fileNameOnly newly included in 2.2.0.
Hi,
Does the memory table in which spark-structured streaming results are
sinked into, is available to other spark applications on the cluster? Is it
by default global or will only be available to context where streaming is
being done
thanks
Imran
--
I.R
I am trying out the network word count example and my unit test is
producing the blow console output with an exception
Exception in thread "dispatcher-event-loop-5"
java.lang.NoClassDefFoundError:
scala/runtime/AbstractPartialFunction$mcVL$sp
at java.lang.ClassLoader.defineClass1(Native Method)
strangely this is working only for very small dataset of rows.. for very
large datasets apparently the partitioning is not working. is there a limit
to the number of columns or rows when repartitioning according to multiple
columns?
regards,
Imran
On Wed, Oct 18, 2017 at 11:00 AM, Imran Rajjad
Simple way is to have a network volume mounted with same name to make
things easy
On Thu, 19 Oct 2017 at 8:24 PM Uğur Sopaoğlu wrote:
> Hello,
>
> I have a very easy problem. How I run a spark job, I must copy jar file to
> all worker nodes. Is there any way to do simple?.
--
[12,24,3,1]
>>>>
[12,22,4,1]
>>>>
[11,22,1,1]
[11,22,2,1]
>>>>
[11,21,1,1]
>>>>
[13,22,4,1]
>>>>
On Wed, Oct 18, 2017 at 10:29 AM, ayan guha <guha.a...@gmail.com> wrote:
> How or what yo
don't think so. check out the documentation for this method
On Wed, Oct 18, 2017 at 10:11 AM, Mina Aslani <aslanim...@gmail.com> wrote:
> I have not tried rdd.unpersist(), I thought using rdd = null is the same,
> is it not?
>
> On Wed, Oct 18, 2017 at 1:07 AM, Imran Rajjad
Hi,
I have a set of rows that are a result of a groupBy(col1,col2,col3).count().
Is it possible to map rows belong to unique combination inside an iterator?
e.g
col1 col2 col3
a 1 a1
a 1 a2
b 2 b1
b 2 b2
how can I separate rows with col1 and col2 =
did you try calling rdd.unpersist()
On Wed, Oct 18, 2017 at 10:04 AM, Mina Aslani wrote:
> Hi,
>
> I get "No space left on device" error in my spark worker:
>
> Error writing stream to file /usr/spark-2.2.0/work/app-.../0/stderr
> java.io.IOException: No space left on
Is there a way around to implement a separate Java class that implements
serializable interface for even small petty arithmetic operations?
below is code from simple decision tree example
Double testMSE = predictionAndLabel.map(new Function, Double>() {
@Override
if the datasets hold objects of different classes, then you will have to
convert both of them to rdd and then rename the columns befrore you call
rdd1.subtract(rdd2)
On Thu, Oct 12, 2017 at 10:16 PM, Shashikant Kulkarni <
shashikant.kulka...@gmail.com> wrote:
> Hello,
>
> I have 2 datasets,
spatial
> and logical operators combined? Maybe I am not understanding the issue
> properly
>
> Ram
>
> On Tue, Oct 10, 2017 at 2:21 AM, Imran Rajjad <raj...@gmail.com> wrote:
>
>> I need to have a location column inside my Dataframe so that I can do
>> spati
I need to have a location column inside my Dataframe so that I can do
spatial queries and geometry operations. Are there any third-party packages
that perform this kind of operations. I have seen a few like Geospark and
megalan but they don't support operations where spatial and logical
operators
Try tachyon.. its less fuss
On Fri, 29 Sep 2017 at 8:32 PM lucas.g...@gmail.com
wrote:
> We use S3, there are caveats and issues with that but it can be made to
> work.
>
> If interested let me know and I'll show you our workarounds. I wouldn't
> do it naively though,
but I
don't have much dependencies apart from a few POJOs which have been
included through context
regards,
Imran
On Wed, Sep 20, 2017 at 9:00 PM, Felix Cheung <felixcheun...@hotmail.com>
wrote:
> Could you include the code where it fails?
> Generally the best way to use gf is to use
Trying to run graph frames on a spark cluster. Do I need to include the
package in spark context settings? or the only the driver program is
suppose to have the graphframe libraries in its class path? Currently the
job is crashing when action function is invoked on graphframe classes.
regards,
alexander.com/blog/post/java/java-xmx-xms-
> memory-heap-size-control
>
> On Thu, Sep 7, 2017 at 12:16 PM, Imran Rajjad <raj...@gmail.com> wrote:
>
>> I am getting Out of Memory error while running connectedComponents job on
>> graph with around 12000 vertices and 1346
I am getting Out of Memory error while running connectedComponents job on
graph with around 12000 vertices and 134600 edges.
I am running spark in embedded mode in a standalone Java application and
have tried to increase the memory but it seems that its not taking any
effect
sparkConf = new
Dear list,
I am following the documentation of graphframe and have started the scala
shell using following command
D:\spark-2.1.0-bin-hadoop2.7\bin>spark-shell --master local[2] --packages
graphframes:graphframes:0.5.0-spark2.1-s_2.10
Ivy Default Cache set to: C:\Users\user\.ivy2\cache
The
the jdbc url is invalid, but strangely it should have thrown ORA- exception
On Mon, Aug 28, 2017 at 4:55 PM, Naga G <gudurun...@gmail.com> wrote:
> Not able to find the database name.
> ora is the database in the below url ?
>
> Sent from Naga iPad
>
> > On Aug 28, 201
If reading directly from file then Spark SQL should be your choice
On Mon, Aug 28, 2017 at 10:25 PM Michael Artz
wrote:
> Just to be clear, I'm referring to having Spark reading from a file, not
> from a Hive table. And it will have tungsten engine off heap
Hello,
I am trying to retrieve an oracle table into Dataset using following
code
String url = "jdbc:oracle@localhost:1521:ora";
Dataset jdbcDF = spark.read()
.format("jdbc")
.option("driver", "oracle.jdbc.driver.OracleDriver")
.option("url", url)
.option("dbtable",
Dear List,
Are there any future plans to implement cursor reset or previous record
functionality in Thrift Server`s JDBC driver? Are there any other
alternatives?
java.sql.SQLException: Method not supported
at
org.apache.hive.jdbc.HiveBaseResultSet.previous(HiveBaseResultSet.java:643)
regards
I am unable to register the Solr Cloud as data source in Spark 2.1.0.
Following the documentation at
https://github.com/lucidworks/spark-solr#import-jar-file-via-spark-shell, I
have used the 3.0.0.beta3 version.
The system path is displaying the added jar as
Greetings,
We are trying out Spark 2 + ThriftServer to join multiple
collections from a Solr Cloud (6.4.x). I have followed this blog
https://lucidworks.com/2015/08/20/solr-spark-sql-datasource/
I understand that initially spark populates the temporary table with 18633014
records and takes its
28 matches
Mail list logo