Re: [Discuss] Metrics Support for DS V2

2020-01-19 Thread Sandeep Katta
ple, if file >> source needs to report size metrics per row, it'll be super slow. >> >> what metrics a source should report? data size? numFiles? read time? >> >> shall we show metrics in SQL web UI as well? >> >> On Fri, Jan 17, 2020 at 3:07 PM Sandeep Katta <

[Discuss] Metrics Support for DS V2

2020-01-16 Thread Sandeep Katta
Hi Devs, Currently DS V2 does not update any input metrics. SPARK-30362 aims at solving this problem. We can have the below approach. Have marker interface let's say "ReportMetrics" If the DataSource Implements this interface, then it will be easy to collect the metrics. For e.g.

Re: Fw:Re: [VOTE][SPARK-29018][SPIP]:Build spark thrift server based on protocol v11

2019-12-29 Thread Sandeep Katta
8/2019 22:29 >>>> To: dev-ow...@spark.apache.org >>>> >>>> Subject: Re: [VOTE][SPARK-29018][SPIP]:Build spark thrift server based >>>> on protocol v11 >>>> >>>> Add spark-dev group access privilege to google. >&

spark-3.0.0-preview release notes link is broken

2019-11-28 Thread Sandeep Katta
Hi, I see for preview release, release notes link is broken. https://spark.apache.org/releases/spark-release-3-0-0-preview.html

Re: [VOTE] Release Apache Spark 2.3.3 (RC1)

2019-01-28 Thread Sandeep Katta
I feel this https://issues.apache.org/jira/browse/SPARK-26154 bug should be fixed in this release as it is related to data correctness On Mon, 28 Jan 2019 at 17:55, Takeshi Yamamuro wrote: > Hi, all > > I checked the two issues below had been resolved and there is no blocker > for branch-2.3

Re: Why does join use rows that were sent after watermark of 20 seconds?

2018-12-10 Thread Sandeep Katta
Hi Abhijeet, You are using inner join with unbounded state which means every data in stream ll match with other stream infinitely, If you want the intended behaviour you should add time stamp conditions or window operator in join condition On Mon, 10 Dec 2018 at 5:23 PM, Abhijeet Kumar

Re: some doubt on code understanding

2018-10-17 Thread Sandeep Katta
:) thanks I am wondering how did I miss that :) :) On Wed, 17 Oct 2018 at 21:58, Sean Owen wrote: > "/" is integer division, so "x / y * y" is not x, but more like the > biggest multiple of y that's <= x. > On Wed, Oct 17, 2018 at 11:25 AM Sandeep Katta > w

some doubt on code understanding

2018-10-17 Thread Sandeep Katta
is there any specific reason why that code is written like above,or is it by mistake ? Apologies upfront if this is really silly/basic question. Regards Sandeep Katta

Re: Filtering based on a float value with more than one decimal place not working correctly in Pyspark dataframe

2018-09-26 Thread Sandeep Katta
I think it is similar to the one SPARK-25452 Regards Sandeep Katta On Wed, 26 Sep 2018 at 11:16 AM, Meethu Mathew wrote: > Hi all, > > I tried the following code and the output was not as expected. > > schema = StructType([StructField('Id', Stri

Pool Information Details cannot be accessed from HistoryServer UI

2018-09-06 Thread Sandeep Katta
*IllegalArgumentException(*s"Unknown pool: **$*poolName*"*) } As per code it is clear HistoryServer does not have sparkContext so it can’t get pool details. *Do you guys think it is required to support for HistoryServer or is this valid behaviour ?* Regards Sandeep Katta

Re: Query on Spark Hive with kerberos Enabled on Kubernetes

2018-07-20 Thread Sandeep Katta
Can you please tell us what exception you ve got,any logs for the same ? On Fri, 20 Jul 2018 at 8:36 PM, Garlapati, Suryanarayana (Nokia - IN/Bangalore) wrote: > Hi All, > > I am trying to use Spark 2.2.0 Kubernetes( > https://github.com/apache-spark-on-k8s/spark/tree/v2.2.0-kubernetes-0.5.0) >

Re: CVE-2018-8024 Apache Spark XSS vulnerability in UI

2018-07-17 Thread Sandeep Katta
Sandeep Katta On Thu, 12 Jul 2018 at 1:47 AM, Sean Owen wrote: > Severity: Medium > > Vendor: The Apache Software Foundation > > Versions Affected: > Spark versions through 2.1.2 > Spark 2.2.0 through 2.2.1 > Spark 2.3.0 > > Description: > In Apache Spark up to an

Re: [Spark-Core]port opened by the SparkDriver is vulnerable to flooding attacks

2018-02-28 Thread Sandeep Katta
Yeh monitor is present but for some cases like long running job I found App master is idle.so it will end up closing the App master’s channel so job will not be completed. So needed a mechanism to close only invalid connections . On Wed, 28 Feb 2018 at 10:54 PM, Marcelo Vanzin