Couple of questions regarding udfs:
1) Is there a way to get all the registered UDFs in spark scala?
I couldn’t find any straight forward api. But found a pattern to get all the
registered udfs.
Spark.catalog.listfunctions.filter(_.className == null).collect
This does the trick but not sure it
Hi All,
Is there any way to copy all the tables in parallel from RDBMS using Spark?
We are looking for a functionality similar to Sqoop.
Thanks,
Surendra
Hi guys,
Recently I open a question[1] on StackOverflow about leader election
with ZooKeeper high-availability backend. It puzzles me for some days
and it would be really help if you can take a look or even give some
thoughts.
Copy the content to mailing list:
Spark uses Curator#LeaderLatch for
Hi guys,
Recently I open a question[1] on StackOverflow about leader election
with ZooKeeper high-availability backend. It puzzles me for some days
and it would be really help if you can take a look or even give some
thoughts.
Copy the content to mailing list:
Spark uses Curator#LeaderLatch for
Hi All,
We have Spark Streaming pipelines(written in java) currently running on yarn
in production. We are evaluating moving these streaming pipelines onto
Kubernetes. We had set up a working Kubernetes cluster. I have been reading
Spark documentation and a few other blogs on migrating them to
Hi Mich,
Please take a look at how to write data into Kafka topic with DStreams:
https://github.com/gaborgsomogyi/spark-dstream-secure-kafka-sink-app/blob/62d64ce368bc07b385261f85f44971b32fe41327/src/main/scala/com/cloudera/spark/examples/DirectKafkaSinkWordCount.scala#L77
(DStreams has no native
Thanks for your reply. Your help is very valuable and all these links
are helpful (especially your example)
Best Regards
--Iacovos
On 3/27/19 10:42 PM, Luca Canali wrote:
I find that the Spark metrics system is quite useful to gather
resource utilization metrics of Spark applications,
I find that the Spark metrics system is quite useful to gather resource
utilization metrics of Spark applications, including CPU, memory and I/O.
If you are interested an example how this works for us at:
https://db-blog.web.cern.ch/blog/luca-canali/2019-02-performance-dashboard-apache-spark
If
Hi,
In a traditional we get data via Kafka into Spark streaming, do some work
and write to a NoSQL database like Mongo, Hbase or Aerospike.
That part can be done below and is best explained by the code as follows:
Once a high value DF lookups is created I want send the data to a new topic
for
Thanks Gabor - your comment helps me clarify my question.
Yes, I have maxFilesPerTrigger set to 1 on the Read Stream call. I am also
seeing the Streaming Query process the single input file, however a single file
on input does not appear to result in the Streaming Query writing the output to
Hi Matt,
Maybe you could set maxFilesPerTrigger to 1.
BR,
G
On Wed, Mar 27, 2019 at 4:45 PM Matt Kuiper
wrote:
> Hello,
>
> I am new to Spark and Structured Streaming and have the following File
> Output Sink question:
>
> Wondering what (and how to modify) triggers a Spark Sturctured
Hello,
I am new to Spark and Structured Streaming and have the following File Output
Sink question:
Wondering what (and how to modify) triggers a Spark Sturctured Streaming Query
(with Parquet File output sink configured) to write data to the parquet files.
I periodically feed the Stream
We are using spark batch to write Dataframe to Kafka topic. The spark write
function with write.format(source = Kafka).
Does spark provide similar guarantee like it provides with saving dataframe
to disk; that partial data is not written to Kafka i.e. full dataframe is
saved or if job fails no
Hi Lian,
many thanks for the detailed information and sharing the solution with us.
I will forward this to a student and hopefully will resolve the issue.
Best regards,
On Wed, Mar 27, 2019 at 1:55 AM Lian Jiang wrote:
> Hi Gezim,
>
> My execution plan of the data frame to write into HDFS is
Hi Vanzin,
"spark.authenticate" is working properly for our environment (Spark 2.4 on
Kubernetes).
We have made few code changes through which secure communication between driver
and executor is working fine using shared spark.authenticate.secret.
Even SASL encryption works but when we set,
15 matches
Mail list logo