Spark kafka streaming failure recovery scenario
Hi All, I have a scenario where my streaming application is running and in between the application has restarted, So basically i want to start from most recent data and go back to old data. Is there any way in spark we can do this or are we planning for providing such kind of flexibility in future.because if my application down time is more, it will be obvious that i will not be able to show any recent data insights to the user for some time. please let me know for any suggestions. Thanks, Sujith -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Regarding NimbusDS JOSE JWT jar 3.9 security vulnerability
Hi Folks, I observed that in spark 2.2.x version we are using NimbusDS JOSE JWT jar 3.9 version, but i saw few vulnerability has been reported for this particular version jar. please refer below details https://nvd.nist.gov/vuln/detail/CVE-2017-12973, https://www.cvedetails.com/cve/CVE-2017-12972/ As per details this vulnerability is been detected prior to 4.39 jars, we are planning to upgrade this jar. Just wanted to know that is their any reason why this jar has not been upgraded in community release as this consists of vulnerabilities. Appreciate your suggestions. Thanks, Sujith -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
About 2.2.1 release
Hi Folks, Just wanted to know about the spark 2.2.1 release date, please let me know the expected release date for this version. Thanks, Sujith -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: Limit Query Performance Suggestion
Dear Liang, Thanks for your valuable feedback. There was a mistake in the previous post i corrected it, as you mentioned the `GlobalLimit` we will only take the required number of rows from the input iterator which really pulls data from local blocks and remote blocks. but if the limit value is very high >= 1000, and when there will be a shuffle exchange happens between `GlobalLimit` and `LocalLimit` to retrieve data from all partitions to one partition, since the limit value is very large the performance bottleneck still exists. soon in next post i will publish a test report with sample data and also figuring out a solution for this problem. Please let me know for any clarifications or suggestions regarding this issue. Regards, Sujith -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Limit-Query-Performance-Suggestion-tp20570p20640.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org