unsubscribe
unsubscribe
Re: Passing an array of more than 22 elements in a UDF
What's the privilege of using that specific version for this? Please throw some light onto it. On Mon, Dec 25, 2017 at 6:51 AM, Felix Cheungwrote: > Or use it with Scala 2.11? > > -- > *From:* ayan guha > *Sent:* Friday, December 22, 2017 3:15:14 AM > *To:* Aakash Basu > *Cc:* user > *Subject:* Re: Passing an array of more than 22 elements in a UDF > > Hi I think you are in correct track. You can stuff all your param in a > suitable data structure like array or dict and pass this structure as a > single param in your udf. > > On Fri, 22 Dec 2017 at 2:55 pm, Aakash Basu > wrote: > >> Hi, >> >> I am using Spark 2.2 using Java, can anyone please suggest me how to take >> more than 22 parameters in an UDF? I mean, if I want to pass all the >> parameters as an array of integers? >> >> Thanks, >> Aakash. >> > -- > Best Regards, > Ayan Guha >
Re: Which kafka client to use with spark streaming
Hey Serkan, it depends of your Kafka version... Is it 0.8.2? Em 25 de dez de 2017 06:17, "Serkan TAS"escreveu: > Hi, > > > > Working on spark 2.2.0 cluster and 1.0 kafka brokers. > > > > I was using the library > > "org.apache.spark" % "spark-streaming-kafka-0-10_2.11" % "2.2.0" > > > > and had lots of problems during streaming process then downgraded to > >"org.apache.spark" % "spark-streaming-kafka-0-8_2.11" % > "2.2.0" > > > > And i know there is also another path which is using kafka-clients jars > which has the latest version of 1.0.0 > > > > > > > > org.apache.kafka > > kafka-clients > > 1.0.0 > > > > > > I am confused which path is the right one > > > > Thanks… > > > > > > -- > > Bu ileti hukuken korunmuş, gizli veya ifşa edilmemesi gereken bilgiler > içerebilir. Şayet mesajın gönderildiği kişi değilseniz, bu iletiyi > çoğaltmak ve dağıtmak yasaktır. Bu mesajı yanlışlıkla alan kişi, bu durumu > derhal gönderene telefonla ya da e-posta ile bildirmeli ve bilgisayarından > silmelidir. Bu iletinin içeriğinden yalnızca iletiyi gönderen kişi > sorumludur. > > This communication may contain information that is legally privileged, > confidential or exempt from disclosure. If you are not the intended > recipient, please note that any dissemination, distribution, or copying of > this communication is strictly prohibited. Anyone who receives this message > in error should notify the sender immediately by telephone or by return > communication and delete it from his or her computer. Only the person who > has sent this message is responsible for its content. >
Is there a way to make the broker merge big result set faster?
Hi, community, I have a subquery running slow on druid cluster. The *inner query* yield fields: *SELECT D1, D2, D3, MAX(M1) as MAX_M1* *FROM SOME_TABLE* *GROUP BY D1, D2, D3* Then, the outer query looks like: *SELECT D1, D2, SUM(MAX_M1)* *FROM INNER_QUERY* *GROUP BY D1, D2* The D3 is a high cardinality dimension, which makes the result set of the inner query very huge. But still, the inner query itself takes 1~2 seconds to "process" and transfer the data to the broker. The outer query, however, takes 40 seconds to process. As far as I understand how broker work with the historicals, I think the druid simply fetch the result of each segment directly from historicals' memory for the inner query, so that there isn't any computation when druid deals with the inner query. However, as the inner query finishes, all the results from the historicals will be passed to one single broker for merging the result. In my case, because the result set from the inner query is tremendous, this phase takes a long time to finish. I think the situation mentioned in this thread is quite similar to my case: https://groups.google.com/d/msg/druid-user/ir7hRpxg0PI/3oqCDAwoPjMJ Gian mentioned "Historical merging", and I have tried that by disabling the broker cache, but it didn't really make the query faster. Is there any other way to make broker merge faster? Thanks! Best regards, Mu
Re: Apache Spark - (2.2.0) - window function for DataSet
Window function requires a timestamp column because you will apply a function for each window (like an aggregation). You still can use UDF for customized tasks Em 25 de dez de 2017 20:15, "M Singh"escreveu: > Hi: > I would like to use window function on a DataSet stream (Spark 2.2.0) > The window function requires Column as argument and can be used with > DataFrames by passing the column. Is there any analogous window function or > pointers to how window function can be used for DataSets ? > > Thanks >
Re: Apache Spark - Structured Streaming from file - checkpointing
Can you please post here your code? Em 25 de dez de 2017 19:24, "M Singh"escreveu: > Hi: > > I am using spark structured streaming (v 2.2.0) to read data from files. I > have configured checkpoint location. On stopping and restarting the > application, it looks like it is reading the previously ingested files. Is > that expected behavior ? > > Is there anyway to prevent reading files that have already been ingested ? > If a file is partially ingested, on restart - can we start reading the > file from previously checkpointed offset ? > > Thanks >
Re: Apache Spark - Structured Streaming graceful shutdown
Hi M Singh! Here I'm using query.stop() Em 25 de dez de 2017 19:19, "M Singh"escreveu: > Hi: > Are there any patterns/recommendations for gracefully stopping a > structured streaming application ? > Thanks > > >
Apache Spark - (2.2.0) - window function for DataSet
Hi:I would like to use window function on a DataSet stream (Spark 2.2.0)The window function requires Column as argument and can be used with DataFrames by passing the column. Is there any analogous window function or pointers to how window function can be used for DataSets ? Thanks
Apache Spark - Structured Streaming from file - checkpointing
Hi: I am using spark structured streaming (v 2.2.0) to read data from files. I have configured checkpoint location. On stopping and restarting the application, it looks like it is reading the previously ingested files. Is that expected behavior ? Is there anyway to prevent reading files that have already been ingested ? If a file is partially ingested, on restart - can we start reading the file from previously checkpointed offset ? Thanks
Apache Spark - Structured Streaming graceful shutdown
Hi:Are there any patterns/recommendations for gracefully stopping a structured streaming application ?Thanks
Re: Spark Docker
You find several presentations on this at the Spark summit web page. Generally you have also to make a decision if you run one cluster for all applications or one cluster per application in the container context. Not sure though why do you want to run just on one node. If you have only one node then you may want to go not for Spark at all. > On 25. Dec 2017, at 09:54, sujeet jogwrote: > > Folks, > > Can you share your experience of running spark under docker on a single > local / standalone node. > Anybody using it under production environments ?, we have a existing Docker > Swarm deployment, and i want to run Spark in a seperate FAT VM hooked / > controlled by docker swarm > > I know there is no official clustering support for running spark under docker > swarm, but can it be used to run on a single FAT VM controlled by Swarm. > > Any insights on this would be appreciated / production mode experiences etc. > > Thanks, > Sujeet - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Spark Docker
Folks, Can you share your experience of running spark under docker on a single local / standalone node. Anybody using it under production environments ?, we have a existing Docker Swarm deployment, and i want to run Spark in a seperate FAT VM hooked / controlled by docker swarm I know there is no official clustering support for running spark under docker swarm, but can it be used to run on a single FAT VM controlled by Swarm. Any insights on this would be appreciated / production mode experiences etc. Thanks, Sujeet
Which kafka client to use with spark streaming
Hi, Working on spark 2.2.0 cluster and 1.0 kafka brokers. I was using the library "org.apache.spark" % "spark-streaming-kafka-0-10_2.11" % "2.2.0" and had lots of problems during streaming process then downgraded to "org.apache.spark" % "spark-streaming-kafka-0-8_2.11" % "2.2.0" And i know there is also another path which is using kafka-clients jars which has the latest version of 1.0.0 org.apache.kafka kafka-clients 1.0.0 I am confused which path is the right one Thanks… Bu ileti hukuken korunmuş, gizli veya ifşa edilmemesi gereken bilgiler içerebilir. Şayet mesajın gönderildiği kişi değilseniz, bu iletiyi çoğaltmak ve dağıtmak yasaktır. Bu mesajı yanlışlıkla alan kişi, bu durumu derhal gönderene telefonla ya da e-posta ile bildirmeli ve bilgisayarından silmelidir. Bu iletinin içeriğinden yalnızca iletiyi gönderen kişi sorumludur. This communication may contain information that is legally privileged, confidential or exempt from disclosure. If you are not the intended recipient, please note that any dissemination, distribution, or copying of this communication is strictly prohibited. Anyone who receives this message in error should notify the sender immediately by telephone or by return communication and delete it from his or her computer. Only the person who has sent this message is responsible for its content.