Hi,
Cody's right.
subscribe - Topic subscription strategy that accepts topic names as a
comma-separated string, e.g. topic1,topic2,topic3 [1]
subscribepattern - Topic subscription strategy that uses Java’s
java.util.regex.Pattern for the topic subscription regex pattern of topics
to subscribe
Looks like my problem was the order of awaitTermination() for some reason.
*Doesn't work *
outputDS1.writeStream().trigger(Trigger.processingTime(1000)).foreach(new
KafkaSink("hello1")).start().awaitTermination()
outputDS2.writeStream().trigger(Trigger.processingTime(1000)).foreach(new
Hi,
Ah, right! Start the queries and once they're running, awaitTermination
them.
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Spark Structured Streaming (Apache Spark 2.2+)
https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2
Hi,
What's the code in readFromKafka to read from hello2 and hello1?
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Spark Structured Streaming (Apache Spark 2.2+)
https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me
Looks like my problem was the order of awaitTermination() for some reason.
Doesn't work
On Tue, Sep 19, 2017 at 1:54 PM, kant kodali wrote:
> Hi All,
>
> I have the following Psuedo code (I could paste the real code however it
> is pretty long and involves Database
To set Spark2 as default, refer
https://www.cloudera.com/documentation/spark2/latest/topics/spark2_admin.html#default_tools
-Original Message-
From: Gaurav1809 [mailto:gauravhpan...@gmail.com]
Sent: Wednesday, September 20, 2017 9:16 AM
To: user@spark.apache.org
Subject: Cloudera - How
Hello all,
I downloaded CDH and it comes with Spark 1.6
As per the step by step guide given - I added Spark 2 in the services list.
Now I can see both Spark 1.6 & Spark 2 And when I do Spark-Shell in terminal
window, it starts with Spark 1.6 only. How to switch to Spark 2? What all
_HOMEs or
---
TSMC PROPERTY
This email communication (and any attachments) is proprietary information
for the sole use of its
Hi All,
I have the following Psuedo code (I could paste the real code however it is
pretty long and involves Database calls inside dataset.map operation and so
on) so I am just trying to simplify my question. would like to know if
there is something wrong with the following pseudo code?
DataSet
You should be able to pass a comma separated string of topics to
subscribe. subscribePattern isn't necessary
On Tue, Sep 19, 2017 at 2:54 PM, kant kodali wrote:
> got it! Sorry.
>
> On Tue, Sep 19, 2017 at 12:52 PM, Jacek Laskowski wrote:
>>
>> Hi,
>>
>>
This may also be related to
https://issues.apache.org/jira/browse/SPARK-22033
On Tue, Sep 19, 2017 at 3:40 PM, Mark Bittmann wrote:
> I've run into this before. The EigenValueDecomposition creates a Java
> Array with 2*k*n elements. The Java Array is indexed with a native
got it! Sorry.
On Tue, Sep 19, 2017 at 12:52 PM, Jacek Laskowski wrote:
> Hi,
>
> Use subscribepattern
>
> You haven't googled well enough --> https://jaceklaskowski.
> gitbooks.io/spark-structured-streaming/spark-sql-streaming-
> KafkaSource.html :)
>
> Pozdrawiam,
> Jacek
Hi,
Use subscribepattern
You haven't googled well enough -->
https://jaceklaskowski.gitbooks.io/spark-structured-streaming/spark-sql-streaming-KafkaSource.html
:)
Pozdrawiam,
Jacek Laskowski
https://about.me/JacekLaskowski
Spark Structured Streaming (Apache Spark 2.2+)
HI All,
I am wondering How to read from multiple kafka topics using structured
streaming (code below)? I googled prior to asking this question and I see
responses related to Dstreams but not structured streams. Is it possible to
read multiple topics using the same spark structured stream?
I've run into this before. The EigenValueDecomposition creates a Java Array
with 2*k*n elements. The Java Array is indexed with a native integer type,
so 2*k*n cannot exceed Integer.MAX_VALUE values.
The array is created here:
https://github.com/apache/spark/blob/master/mllib/src/
I ended up doing this for the time being. It works but I *think* that
timestamp seems like a rational partitionColumn and I'm wondering if
there's a more built in way:
> df = spark.read.jdbc(
>
> url=os.environ["JDBCURL"],
>
> table="schema.table",
>
> predicates=predicates
>
> )
>
Have you tried to cache? maybe after the collect() and before the map?
> On Sep 19, 2017, at 7:20 AM, Daniel O' Shaughnessy
> wrote:
>
> Thanks for your response Jean.
>
> I managed to figure this out in the end but it's an extremely slow solution
> and not
Hello guys,
While trying to compute SVD using computeSVD() function, i am getting the
following warning with the follow up exception:
17/09/14 12:29:02 WARN RowMatrix: computing svd with k=49865 and n=191077,
please check necessity
IllegalArgumentException: u'requirement failed: k = 49865 and/or
How big is the list of fruits in your example? Can you broadcast it?
On Tue, 19 Sep 2017 at 9:21 pm, Daniel O' Shaughnessy <
danieljamesda...@gmail.com> wrote:
> Thanks for your response Jean.
>
> I managed to figure this out in the end but it's an extremely slow
> solution and not tenable for
Hi,
I've just noticed the new "avg hash probe" metric for HashAggregateExec
operator [1].
My understanding (after briefly going through the code in the change and
around) is as follows:
Average hash map probe per lookup (i.e. `numProbes` / `numKeyLookups`)
NOTE: `numProbes` and `numKeyLookups`
Thanks for your response Jean.
I managed to figure this out in the end but it's an extremely slow solution
and not tenable for my use-case:
val rddX = dfWithSchema.select("event_name").rdd.map(_.getString(0).split(
",").map(_.trim replaceAll ("[\\[\\]\"]", "")).toList)
//val oneRow =
Thanks Cody,
It worked for me buy keeping num executor with each having 1 core = num of
partitions of kafka.
On Mon, Sep 18, 2017 at 8:47 PM Cody Koeninger wrote:
> Have you searched in jira, e.g.
>
> https://issues.apache.org/jira/browse/SPARK-19185
>
> On Mon, Sep 18,
Hi,
I've a csv dataset which has a column with all the details of store open
and close timings as per dates, but the data is highly variant, as follows -
Mon-Fri 10am-9pm, Sat 10am-8pm, Sun 12pm-6pm
Mon-Sat 10am-8pm, Sun Closed
Mon-Sat 10am-8pm, Sun 10am-6pm
Mon-Friday 9-8 / Saturday 10-7 /
Hi All,
Apologies for cross posting, I posted this on Kafka user group. Is there
any way that I can use kerberos ticket cache of the spark executor for
reading from secured kafka? I am not 100% that the executor would do a
kinit, but I presume so to be able to run code / read hdfs etc as the user
24 matches
Mail list logo