from:"Alex"

Unsubscribe

2023-08-01 Thread Alex Landa

Unsubscribe

[Spark thread pool configurations]: I would like to configure all ThreadPoolExecutor parameters for each thread pool started in Spark

2022-07-27 Thread Alex Peelman

Hi everyone, My name is Alex and I've been using Spark for the past 4 years to solve most, if not all, of my data processing challenges. From time to time I go a bit left field with this :). Like embedding Spark in my JVM based application running only in `local` mode and using it as a real

RE: [EXTERNAL] Re: spark ETL and spark thrift server running together

2022-03-30 Thread Alex Kosberg

Hi Christophe, Thank you for the explanation! Regards, Alex From: Christophe Préaud Sent: Wednesday, March 30, 2022 3:43 PM To: Alex Kosberg ; user@spark.apache.org Subject: [EXTERNAL] Re: spark ETL and spark thrift server running together Hi Alex, As stated in the Hive documentation (https

spark ETL and spark thrift server running together

2022-03-30 Thread Alex Kosberg

Hi, Some details: * Spark SQL (version 3.2.1) * Driver: Hive JDBC (version 2.3.9) * ThriftCLIService: Starting ThriftBinaryCLIService on port 1 with 5...500 worker threads * BI tool is connect via odbc driver After activating Spark Thrift Server I'm unable to

Re: [Spark SQL] Structured Streaming in pyhton can connect to cassandra ?

2022-03-25 Thread Alex Ott

ad.RLock' object gf> Can you please tell me how to do this? gf> Or at least give me some advice? gf> Sincerely, gf> FARCY Guillaume. gf> - gf> To unsubscribe e-mail: user-unsubscr...@spark.apache.org -

Re: Spark 3.2.0 upgrade

2022-01-22 Thread Alex Ott

ders.java:178) AS> at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521) AS> Thanks AS> AS> Amit -- With best wishes,Alex Ott http://alexott.net/ Twitter: alexott_en (English), alexott (Russian) - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Unable to use WriteStream to write to delta file.

2021-12-19 Thread Alex Ott

t; scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) >> >> at >> org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) >> >> at org.apache.spark.sql.execution.streaming.StreamExecution.org >> $apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:286) >> >> at >> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:209) >> >> obj.test_ingest_incremental_data_batch1() >> >> File >> "C:\Users\agundapaneni\Development\ModernDataEstate\tests\test_mdefbasic.py", >> line 56, in test_ingest_incremental_data_batch1 >> >> mdef.ingest_incremental_data('example', entity, >> self.schemas['studentattendance'], 'school_year') >> >> File >> "C:\Users\agundapaneni\Development\ModernDataEstate/src\MDEFBasic.py", line >> 109, in ingest_incremental_data >> >> query.awaitTermination() # block until query is terminated, with >> stop() or with error; A StreamingQueryException will be thrown if an >> exception occurs. >> >> File >> "C:\Users\agundapaneni\Development\ModernDataEstate\.tox\default\lib\site-packages\pyspark\sql\streaming.py", >> line 101, in awaitTermination >> >> return self._jsq.awaitTermination() >> >> File >> "C:\Users\agundapaneni\Development\ModernDataEstate\.tox\default\lib\site-packages\py4j\java_gateway.py", >> line 1309, in __call__ >> >> return_value = get_return_value( >> >> File >> "C:\Users\agundapaneni\Development\ModernDataEstate\.tox\default\lib\site-packages\pyspark\sql\utils.py", >> line 117, in deco >> >> raise converted from None >> >> pyspark.sql.utils.StreamingQueryException: >> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.checkFieldNames(Lscala/collection/Seq;)V >> >> === Streaming Query === >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > -- With best wishes,Alex Ott http://alexott.net/ Twitter: alexott_en (English), alexott (Russian)

Re: Spark Structured Streaming Continuous Trigger on multiple sinks

2021-09-12 Thread Alex Ott

second one does not. S> Is there any solution to the problem of being able to write to multiple sinks in Continuous Trigger Mode using Structured Streaming? -- With best wishes,Alex Ott http://alexott.net/ Twitter: alexott_en (English), alexott (Russian)

[PySpark] how to use a JAR model to score a dataset?

2021-08-13 Thread Alex Martishius

Hello, This question has been addressed on Stack Overflow using the spark shell, but not PySpark. I found within the Spark SQL documentation where in PySpark SQL I can load a JAR into my SparkSession config such as: *spark = SparkSession\* *.builder\* *.appName("appname")\* *

[Datasource API V2] Creating datasource - no step for final cleanup on read

2021-01-18 Thread Alex Rehnby

at the end of the read operation using the current API? If not, I would ask if this might be a useful addition, or if there are design reasons for not including such a step. Thanks, Alex

Re: Cassandra raw deletion

2020-07-04 Thread Alex Ott

AS> connector. AS> Thanks AS> Amit -- With best wishes, Alex Ott http://alexott.net/ Twitter: alexott_en (English), alexott (Russian) - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Kafka Zeppelin integration

2020-06-21 Thread Alex Ott

ugh a dependency is specified. I ask is there any way to fix this. Zeppelin version is s> 0.9.0, Spark version is 2.4.6, and kafka version is 2.4.1. I have specified the dependency s> in the packages and add a jar file that contained the kafka stream 010. -- With best wishes,

to_avro/from_avro inserts extra values from Kafka

2020-05-12 Thread Alex Nastetsky

reamingQuery = org.apache.spark.sql.execution.streaming.StreamingQueryWrapper@3990c36c scala> --- Batch: 0 --- +-++-+---+ |firstName|lastName|color| mood| +-++-+---+ | |Suzy| | Samson| | | Jim| |Johnson| +-++-+---+ See the raw bytes: $ kt consume -topic persons-avro-spark9 { "partition": 0, "offset": 0, "key": null, "value": "\u\u0008Suzy\u\u000cSamson\u\u0008blue\u\u000egrimmer", "timestamp": "2020-05-12T17:18:53.858-04:00" } { "partition": 0, "offset": 1, "key": null, "value": "\u\u0006Jim\u\u000eJohnson\u\u000cindigo\u\u0008grim", "timestamp": "2020-05-12T17:18:53.859-04:00" } Thanks, Alex.

Re: Spark structured streaming - performance tuning

2020-04-18 Thread Alex Ott

tasks... Srinivas V at "Sat, 18 Apr 2020 10:32:33 +0530" wrote: SV> Thank you Alex. I will check it out and let you know if I have any questions SV> On Fri, Apr 17, 2020 at 11:36 PM Alex Ott wrote: SV> http://shop.oreilly.com/product/0636920047568.do has quite go

Re: Spark structured streaming - performance tuning

2020-04-17 Thread Alex Ott

out best cluster size and number of executors and cores required. -- With best wishes, Alex Ott http://alexott.net/ Twitter: alexott_en (English), alexott (Russian) - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

unsubscribe

2020-01-04 Thread Alex Pajaron

unsubscribe

unsubscribe

2019-12-07 Thread Alex Pajaron

unsubscribe

Re: Monitor Spark Applications

2019-09-15 Thread Alex Landa

Hi Raman, The banzaicloud jar can also cover the JMX exports. Thanks, Alex On Fri, Sep 13, 2019 at 8:46 AM raman gugnani wrote: > Hi Alex, > > Thanks will check this out. > > Can it be done directly as spark also exposes the metrics or JVM. In > this my one doubt is how t

Re: Monitor Spark Applications

2019-09-12 Thread Alex Landa

). Thanks, Alex On Fri, Sep 13, 2019 at 7:58 AM raman gugnani wrote: > Hi Team, > > I am new to spark. I am using spark on hortonworks dataplatform with > amazon EC2 machines. I am running spark in cluster mode with yarn. > > I need to monitor individual JVMs and o

Re: How to combine all rows into a single row in DataFrame

2019-08-19 Thread Alex Landa

it may cause OOM errors. Thanks, Alex On Mon, Aug 19, 2019 at 11:24 PM Rishikesh Gawade wrote: > Hi All, > I have been trying to serialize a dataframe in protobuf format. So far, I > have been able to serialize every row of the dataframe by using map > function and the logic for s

Re: Spark Standalone - Failing to pass extra java options to the driver in cluster mode

2019-08-19 Thread Alex Landa

Thanks Jungtaek Lim, I upgraded the cluster to 2.4.3 and it worked fine. Thanks, Alex On Mon, Aug 19, 2019 at 10:01 PM Jungtaek Lim wrote: > Hi Alex, > > you seem to hit SPARK-26606 [1] which has been fixed in 2.4.1. Could you > try it out with latest version? > > Than

Spark Standalone - Failing to pass extra java options to the driver in cluster mode

2019-08-19 Thread Alex Landa

uments // print the arguments listOfArguments.asScala.foreach(a => println(s"ARG: $a")) I see that for client mode I get : ARG: -XX:+HeapDumpOnOutOfMemoryError while in cluster mode I get: ARG: -Dspark.driver.extraJavaOptions=-XX:+HeapDumpOnOutOfMemoryError Would appreciate your help how to work around this issue. Thanks, Alex

Re: Long-Running Spark application doesn't clean old shuffle data correctly

2019-07-23 Thread Alex Landa

Hi Keith, I don't think that we keep such references. But we do experience exceptions during the job execution that we catch and retry (timeouts/network issues from different data sources). Can they affect RDD cleanup? Thanks, Alex On Sun, Jul 21, 2019 at 10:49 PM Keith Chapman wrote: >

Re: Long-Running Spark application doesn't clean old shuffle data correctly

2019-07-21 Thread Alex Landa

, Alex On Sun, Jul 21, 2019 at 9:06 AM Prathmesh Ranaut Gmail < prathmesh.ran...@gmail.com> wrote: > This is the job of ContextCleaner. There are few a property that you can > tweak to see if that helps: > spark.cleaner.periodicGC.interval > spark.cleaner

Long-Running Spark application doesn't clean old shuffle data correctly

2019-07-21 Thread Alex Landa

to clean old shuffle data (as it should). How can I configure Spark to delete old shuffle data during the life time of the application (not after)? Thanks, Alex

Re: Learning Spark

2019-07-05 Thread Alex A. Reda

of learning than using an enterprise cluster. Depending on which rout you take, if you decide to focus on Pyspark, learning Scikit learn will provide you a lot of transferable skills. One final note, I am providing the suggestion from the perspective of a data scientist. Kind regards, Alex Reda O

Re: [Spark Core]: What is the release date for Spark 3 ?

2019-06-13 Thread Alex Dettinger

Follow up on the release date for Spark 3. Any guesstimate or rough estimation without commitment would be helpful :) Cheers, Alex On Mon, Jun 10, 2019 at 5:24 PM Alex Dettinger wrote: > Hi guys, > > I was not able to find the foreseen release date for Spark 3. > Would

[Spark Core]: What is the release date for Spark 3 ?

2019-06-10 Thread Alex Dettinger

Hi guys, I was not able to find the foreseen release date for Spark 3. Would one have any information on this please ? Many thanks, Alex

"where" clause able to access fields not in its schema

2019-02-13 Thread Alex Nastetsky

I don't know if this is a bug or a feature, but it's a bit counter-intuitive when reading code. The "b" dataframe does not have field "bar" in its schema, but is still able to filter on that field. scala> val a = sc.parallelize(Seq((1,10),(2,20))).toDF("foo","bar") a:

[PySpark Profiler]: Does empty profile mean no execution in Python Interpreter?

2018-11-01 Thread Alex

/AlexHagerman/pyspark-profiling Thanks, Alex from pyspark.sql import SparkSession from pyspark import SparkContext from pyspark.sql.types import ArrayType from pyspark.sql.functions import broadcast, udf from pyspark.ml.feature import Word2Vec, Word2VecModel from pyspark.ml.linalg import Vector, VectorUDT

Re: partitionBy with partitioned column in output?

2018-02-26 Thread Alex Nastetsky

it. On Mon, Feb 26, 2018 at 5:47 PM, naresh Goud <nareshgoud.du...@gmail.com> wrote: > is this helps? > > sc.parallelize(List((1,10),(2,20))).toDF("foo","bar").map((" > foo","bar")=>("foo",("foo","bar"))

partitionBy with partitioned column in output?

2018-02-26 Thread Alex Nastetsky

$ cat json-out/foo=1/part-3-18ca93d0-c3b1-424b-8ad5-291d8a29523b.json {"bar":10} $ cat json-out/foo=2/part-7-18ca93d0-c3b1-424b-8ad5-291d8a29523b.json {"bar":20} Thanks, Alex.

does Kinesis Connector for structured streaming auto-scales receivers if a cluster is using dynamic allocation and auto-scaling?

2018-02-01 Thread Mikhailau, Alex

does Kinesis Connector for structured streaming auto-scales receivers if a cluster is using dynamic allocation and auto-scaling?

Dataset API inconsistencies

2018-01-09 Thread Alex Nastetsky

to be usable. Has anyone had a similar experience or has had better luck? Alex.

Question regarding cached partitions

2017-10-30 Thread Alex Sulimanov

Hi, I started Spark Streaming job with 96 executors which reads from 96 Kafka partitions and applies mapWithState on the incoming DStream. Why would it cache only 77 partitions? Do I have to allocate more memory? Currently each executor gets 10 GB and it is not clear why it can't cache all 96

Re: Re-sharded kinesis stream starts generating warnings after kinesis shard numbers were doubled

2017-10-04 Thread Mikhailau, Alex

Filed SPARK-22200 From: "Mikhailau, Alex" <alex.mikhai...@mlb.com> Date: Wednesday, October 4, 2017 at 10:43 AM To: "user@spark.apache.org" <user@spark.apache.org> Subject: Re: Re-sharded kinesis stream starts generating warnings after kinesis shard numbers w

Re: Re-sharded kinesis stream starts generating warnings after kinesis shard numbers were doubled

2017-10-04 Thread Mikhailau, Alex

-4454 With 2.2.0 -Alex From: "Mikhailau, Alex" <alex.mikhai...@mlb.com> Date: Wednesday, September 13, 2017 at 4:16 PM To: "user@spark.apache.org" <user@spark.apache.org> Subject: Re-sharded kinesis stream starts generating warnings after kinesis shard numbe

Re-sharded kinesis stream starts generating warnings after kinesis shard numbers were doubled

2017-09-13 Thread Mikhailau, Alex

Has anyone seen the following warnings in the log after a kinesis stream has been re-sharded? com.amazonaws.services.kinesis.clientlibrary.lib.worker.ProcessTask WARN Cannot get the shard for this ProcessTask, so duplicate KPL user records in the event of resharding will not be dropped during

How do I create a JIRA issue and associate it with a PR that I created for a bug in master?

2017-09-12 Thread Mikhailau, Alex

How do I create a JIRA issue and associate it with a PR that I created for a bug in master? https://github.com/apache/spark/pull/19210

spark metrics prefix in Graphite is duplicated

2017-09-06 Thread Mikhailau, Alex

eter. In my Graphite, Spark is recording metrics with duplicate metrics prefix: $env.$namespace.$team.$app.$env.$namespace.$team.$app Has anyone else run into this? Alex

Spark 2.1.1 with Kinesis Receivers is failing to launch 50 active receivers with oversized cluster on EMR Yarn

2017-09-05 Thread Mikhailau, Alex

Guys, I have a Spark 2.1.1 job with Kinesis where it is failing to launch 50 active receivers with oversized cluster on EMR Yarn. It registers sometimes 16, sometimes 32, other times 48 receivers but not all 50. Any help would be greatly appreciated. Kinesis stream shards = 500 YARN EMR

Cloudwatch metrics sink problem

2017-08-31 Thread Mikhailau, Alex

I am getting the following in the logs: Sink class org.apache.spark.metrics.sink.CloudwatchSink cannot be instantiated due to CloudwatchSink ClassNotFoundException. I am running this on EMR 5.7.0. Does anyone have experience adding this sink to an EMR cluster? Thanks, Alex

Re: Referencing YARN application id, YARN container hostname, Executor ID and YARN attempt for jobs running on Spark EMR 5.7.0 in log statements?

2017-08-29 Thread Mikhailau, Alex

t;Mikhailau, Alex" <alex.mikhai...@mlb.com> Cc: "user@spark.apache.org" <user@spark.apache.org> Subject: Re: Referencing YARN application id, YARN container hostname, Executor ID and YARN attempt for jobs running on Spark EMR 5.7.0 in log statements? Each java proc

Re: Referencing YARN application id, YARN container hostname, Executor ID and YARN attempt for jobs running on Spark EMR 5.7.0 in log statements?

2017-08-28 Thread Mikhailau, Alex

. Is there an MDC way with spark or something other than to achieve this? Alex From: Vadim Semenov <vadim.seme...@datadoghq.com> Date: Monday, August 28, 2017 at 5:18 PM To: "Mikhailau, Alex" <alex.mikhai...@mlb.com> Cc: "user@spark.apache.org" <user@spark.apache.org> Sub

Referencing YARN application id, YARN container hostname, Executor ID and YARN attempt for jobs running on Spark EMR 5.7.0 in log statements?

2017-08-28 Thread Mikhailau, Alex

Does anyone have a working solution for logging YARN application id, YARN container hostname, Executor ID and YARN attempt for jobs running on Spark EMR 5.7.0 in log statements? Are there specific ENV variables available or other workflow for doing that? Thank you Alex

Re: --jars from spark-submit on master on YARN don't get added properly to the executors - ClassNotFoundException

2017-08-09 Thread Mikhailau, Alex

Thanks, Marcelo. Will give it a shot tomorrow. -Alex On 8/9/17, 5:59 PM, "Marcelo Vanzin" <van...@cloudera.com> wrote: Jars distributed using --jars are not added to the system classpath, so log4j cannot see them. To work around that, you need to manually ad

--jars from spark-submit on master on YARN don't get added properly to the executors - ClassNotFoundException

2017-08-09 Thread Mikhailau, Alex

nstantiate class [net.logstash.log4j.JSONEventLayoutV1]. java.lang.ClassNotFoundException: net.logstash.log4j.JSONEventLayoutV1 Am I doing something wrong? Thank you, Alex

DStream Spark 2.1.1 Streaming on EMR at scale - long running job fails after two hours

2017-07-26 Thread Mikhailau, Alex

Guys, I am trying hard to make a DStream API Spark streaming job work on EMR. I’ve succeeded to the point of running it for a few hours with eventual failure which is when I start seeing some out of memory exception via “yarn logs” aggregate. I am doing a JSON map and extraction of some

Re: keep or remove sc.stop() coz of RpcEnv already stopped error

2017-03-13 Thread Alex

Hi , I am using spark-1.6 how to ignore this warning because of this Illegal state exception my production jobs which are scheduld are showing completed abnormally... I cant even handle exception as after sc.stop if i try to execute any code again this exception will come from catch block.. so i

Distinct for Avro Key/Value PairRDD

2017-03-09 Thread Alex Sulimanov

Good day everyone! Have you tried to de-duplicated records based on Avro generated classes? These classes extend SpecificRecord which has equals and hashCode implementation, although when i try to use .distinct on my PairRDD (both key and value are Avro classes), it eliminates records which

Re: question on SPARK_WORKER_CORES

2017-02-17 Thread Alex Kozlov

t; arguments, , depending on the spark version? >> >> >> >> >> >> *From:* kant kodali [mailto:kanth...@gmail.com] >> *Sent:* Friday, February 17, 2017 5:03 PM >> *To:* Alex Kozlov <ale...@gmail.com> >> *Cc:* user @spark <user@spark.apach

Re: question on SPARK_WORKER_CORES

2017-02-17 Thread Alex Kozlov

increase number of parallel tasks running from > 4 to 16 so I exported an env variable called SPARK_WORKER_CORES=16 in > conf/spark-env.sh. I though that should do it but it doesn't. It still > shows me 4. any idea? > > > Thanks much! > > > -- Alex Kozlov (408) 507-4987 (650) 887-2135 efax ale...@gmail.com

Re: spark architecture question -- Pleas Read

2017-02-07 Thread Alex

ww.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may

Re: Is DoubleWritable and DoubleObjectInspector doing the same thing in Hive UDF?

2017-02-04 Thread Alex

H, Please Reply? On Fri, Feb 3, 2017 at 8:19 PM, Alex <siri8...@gmail.com> wrote: > Hi, > > can You guys tell me if below peice of two codes are returning the same > thing? > > (((DoubleObjectInspector) ins2).get(obj)); and (DoubleWritable)obj).get() > ; from be

Is DoubleWritable and DoubleObjectInspector doing the same thing in Hive UDF?

2017-02-03 Thread Alex

Hi, can You guys tell me if below peice of two codes are returning the same thing? (((DoubleObjectInspector) ins2).get(obj)); and (DoubleWritable)obj).get(); from below two codes code 1) public Object get(Object name) { int pos = getPos((String)name); if(pos<0) return null;

Re: Suprised!!!!!Spark-shell showing inconsistent results

2017-02-03 Thread Alex

: Inline image 1] On Thu, Feb 2, 2017 at 3:33 PM, Alex <siri8...@gmail.com> wrote: > Hi As shown below same query when ran back to back showing inconsistent > results.. > > testtable1 is Avro Serde table... > > [image: Inline image 1] > > > > hc.sql("sel

Suprised!!!!!Spark-shell showing inconsistent results

2017-02-02 Thread Alex

Hi As shown below same query when ran back to back showing inconsistent results.. testtable1 is Avro Serde table... [image: Inline image 1] hc.sql("select * from testtable1 order by col1 limit 1").collect; res14: Array[org.apache.spark.sql.Row] =

Is it okay to run Hive Java UDFS in Spark-sql. Anybody's still doing it?

2017-02-02 Thread Alex

the same java UDF using Spark-sql or You would recode all java UDF to scala UDF and then run? Regards, Alex

Re: Hive Java UDF running on spark-sql issue

2017-02-01 Thread Alex

convert values to another type depending on what is the type of > the original value? > Kr > > > > On 1 Feb 2017 5:56 am, "Alex" <siri8...@gmail.com> wrote: > > Hi , > > > we have Java Hive UDFS which are working perfectly fine in Hive > > S

Hive Java UDF running on spark-sql issue

2017-01-31 Thread Alex

Hi , we have Java Hive UDFS which are working perfectly fine in Hive SO for Better performance we are migrating the same To Spark-sql SO these jar files we are giving --jars argument to spark-sql and defining temporary functions to make it to run on spark-sql there is this particular Java UDF

Re: does both below code do the same thing? I had to refactor code to fit in spark-sql

2017-01-31 Thread Alex

Guys! Please Reply On Tue, Jan 31, 2017 at 12:31 PM, Alex <siri8...@gmail.com> wrote: > public Object get(Object name) { > int pos = getPos((String) name); > if (pos < 0) > return null; >

Roadblock -- stuck for 10 days :( how come same hive udf giving different results in spark and hive

2017-01-31 Thread Alex

Hi All, i am trying to run a hive udf in spark-sql and its giving different rows as result in both hive and spark.. My UDF query looks something like this select col1,col2,col3, sum(col4) col4, sum(col5) col5,Group_name from (select inline(myudf('cons1',record)) from table1) test group by

alternatives for long to longwritable typecasting in spark sql

2017-01-30 Thread Alex

Hi Guys Please let me know if any other ways to typecast as below is throwing error unable to typecast java.lang Long to Longwritable and same for Double for Text also in spark -sql Below piece of code is from hive udf which i am trying to run in spark-sql public Object get(Object name) {

does both below code do the same thing? I had to refactor code to fit in spark-sql

2017-01-30 Thread Alex

public Object get(Object name) { int pos = getPos((String) name); if (pos < 0) return null; String f = "string"; Object obj = list.get(pos); Object result = null; if (obj ==

Re: help!!!----issue with spark-sql type cast form long to longwritable

2017-01-30 Thread Alex

Hi All, If I modify the code to below The hive UDF is working in spark-sql but it is giving different results..Please let me know difference between these two below codes.. 1) public Object get(Object name) { int pos = getPos((String)name); if(pos<0) return null;

Re: help!!!----issue with spark-sql type cast form long to longwritable

2017-01-30 Thread Alex

How to debug Hive UDfs?! On Jan 24, 2017 5:29 PM, "Sirisha Cheruvu" wrote: > Hi Team, > > I am trying to keep below code in get method and calling that get mthod in > another hive UDF > and running the hive UDF using Hive Context.sql procedure.. > > > switch (f) { > case

how to compare two avro format hive tables

2017-01-30 Thread Alex

Hi Team, how to compare two avro format hive tables if there is same data in it if i give limit 5 its giving different results

Re: help!!!----issue with spark-sql type cast form long to longwritable

2017-01-30 Thread Alex

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error: java.lang.Double cannot be cast to org.apache.hadoop.hive.serde2.io.DoubleWritable] Getting below error while running hive UDF on spark but the UDF is working perfectly fine in Hive.. public Object get(Object name) {

Re: spark architecture question -- Pleas Read

2017-01-29 Thread Alex

) and most likely also a distributed > file system. Spark supports through the Hadoop apis a wide range of file > systems, but does not need HDFS for persistence. You can have local > filesystem (ie any file system mounted to a node, so also distributed ones, > such as zfs), cloud file systems (

Re: spark architecture question -- Pleas Read

2017-01-29 Thread Alex

Hi All, Thanks for your response .. Please find below flow diagram Please help me out simplifying this architecture using Spark 1) Can i skip step 1 to step 4 and directly store it in spark if I am storing it in spark where actually it is getting stored Do i need to retain HAdoop to store data

RE: spark-shell fails to redefine values

2016-12-22 Thread Spencer, Alex (Santander)

Can you ask for eee inbetween each reassign? The memory address at the end 1ec5bf62 != 2c6beb3e or 66cb003 – so what’s going on there? From: Yang [mailto:tedd...@gmail.com] Sent: 21 December 2016 18:37 To: user Subject: spark-shell fails to redefine values summary:

BinaryClassificationMetrics - get raw tp/fp/tn/fn stats per threshold?

2016-09-02 Thread Spencer, Alex (Santander)

_1, x._2 + y._2, x._3 + y._3, x._4 + y._4)) Kind Regards, Alex. Emails aren't always secure, and they may be intercepted or changed after they've been sent. Santander doesn't accept liability if this happens. If you think someone may have interfered with this email, please get in touch with the sender

Re: spark sql aggregate function "Nth"

2016-07-26 Thread Alex Nastetsky

d. In your example, what happens if data is of only 2 rows? > On 27 Jul 2016 00:57, "Alex Nastetsky" <alex.nastet...@vervemobile.com> > wrote: > >> Spark SQL has a "first" function that returns the first item in a group. >> Is there a similar function

spark sql aggregate function "Nth"

2016-07-26 Thread Alex Nastetsky

Spark SQL has a "first" function that returns the first item in a group. Is there a similar function, perhaps in a third party lib, that allows you to return an arbitrary (e.g. 3rd) item from the group? Was thinking of writing a UDAF for it, but didn't want to reinvent the wheel. My endgoal is to

streaming on yarn

2016-06-24 Thread Alex Dzhagriev

scaling (not blocking the resources if they is no data in the stream) and the ui to manage the running jobs. Thanks, Alex.

--jars for mesos cluster

2016-05-03 Thread Alex Dzhagriev

rectly? Thanks, Alex.

Re: Processing millions of messages in milliseconds -- Architecture guide required

2016-04-19 Thread Alex Kozlov

>> message enhancer and then finally a processor. >> I thought about using data cache as well for serving the data >> The data cache should have the capability to serve the historical data >> in milliseconds (may be upto 30 days of data) >> -- >> Thanks >> Deepak >> www.bigdatabig.com >> >> -- Alex Kozlov ale...@gmail.com

Re: Enabling spark_shuffle service without restarting YARN Node Manager

2016-03-16 Thread Alex Dzhagriev

Hi Vinay, I believe it's not possible as the spark-shuffle code should run in the same JVM process as the Node Manager. I haven't heard anything about on the fly bytecode loading in the Node Manger. Thanks, Alex. On Wed, Mar 16, 2016 at 10:12 AM, Vinay Kashyap <vinu.k...@gmail.com> wrote:

Re: sparkR issues ?

2016-03-15 Thread Alex Kozlov

ame() >> in SparkR to avoid such covering. >> >> >> >> *From:* Alex Kozlov [mailto:ale...@gmail.com] >> *Sent:* Tuesday, March 15, 2016 2:59 PM >> *To:* roni <roni.epi...@gmail.com> >> *Cc:* user@spark.apache.org >> *Subject:* Re: sparkR is

Re: sparkR issues ?

2016-03-15 Thread Alex Kozlov

I am not using any spark function , so i would expect > it to work as a simple R code. > why it does not work? > > Appreciate the help > -R > > -- Alex Kozlov (408) 507-4987 (650) 887-2135 efax ale...@gmail.com

Re: Hive Context: Hive Metastore Client

2016-03-08 Thread Alex

to find a solution in the meantime. Thanks, Alex On 3/8/2016 4:00 PM, Mich Talebzadeh wrote: The current scenario resembles a three tier architecture but without the security of second tier. In a typical three-tier you have users connecting to the application server (read Hive server2

Re: Hive Context: Hive Metastore Client

2016-03-08 Thread Alex

iveServer2. --Alex On 3/8/2016 3:13 PM, Mich Talebzadeh wrote: Hi, What do you mean by Hive Metastore Client? Are you referring to Hive server login much like beeline? Spark uses hive-site.xml to get the details of Hive metastore and the login to the metastore which could be any database. Mine

Hive Context: Hive Metastore Client

2016-03-08 Thread Alex F

As of Spark 1.6.0 it is now possible to create new Hive Context sessions sharing various components but right now the Hive Metastore Client is shared amongst each new Hive Context Session. Are there any plans to create individual Metastore Clients for each Hive Context? Related to the question

Re: Spark on RAID

2016-03-08 Thread Alex Kozlov

; as separate mount points) > > My question is why not raid? What is the argument\reason for not using > Raid? > > Thanks! > -Eddie > -- Alex Kozlov

Re: Sorting the RDD

2016-03-03 Thread Alex Dzhagriev

meaningful. Cheers, Alex. On Thu, Mar 3, 2016 at 8:39 AM, Angel Angel <areyouange...@gmail.com> wrote: > Hello Sir/Madam, > > I am try to sort the RDD using *sortByKey* function but i am getting the > following error. > > > My code is > 1) convert the rdd array i

Re: Spark Integration Patterns

2016-02-29 Thread Alex Dzhagriev

Hi Moshir, Regarding the streaming, you can take a look at the spark streaming, the micro-batching framework. If it satisfies your needs it has a bunch of integrations. Thus, the source for the jobs could be Kafka, Flume or Akka. Cheers, Alex. On Mon, Feb 29, 2016 at 2:48 PM, moshir mikael

Re: Spark Integration Patterns

2016-02-29 Thread Alex Dzhagriev

Hi Moshir, I think you can use the rest api provided with Spark: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala Unfortunately, I haven't find any documentation, but it looks fine. Thanks, Alex. On Sun, Feb 28, 2016 at 3:25

Re: reasonable number of executors

2016-02-24 Thread Alex Dzhagriev

Hi Igor, That's a great talk and an exact answer to my question. Thank you. Cheers, Alex. On Tue, Feb 23, 2016 at 8:27 PM, Igor Berman <igor.ber...@gmail.com> wrote: > > http://www.slideshare.net/cloudera/top-5-mistakes-to-avoid-when-writing-apache-spark-applications > >

reasonable number of executors

2016-02-23 Thread Alex Dzhagriev

-side join with bigger table. What other considerations should I keep in mind in order to choose the right configuration? Thanks, Alex.

Re: Can we load csv partitioned data into one DF?

2016-02-22 Thread Alex Dzhagriev

Hi Saif, You can put your files into one directory and read it as text. Another option is to read them separately and then union the datasets. Thanks, Alex. On Mon, Feb 22, 2016 at 4:25 PM, <saif.a.ell...@wellsfargo.com> wrote: > Hello all, I am facing a silly data question. > >

an OOM while persist as DISK_ONLY

2016-02-22 Thread Alex Dzhagriev

is the overhead which consumes that much memory during persist to the disk and how can I estimate what extra memory should I give to the executors in order to make it not fail? Thanks, Alex.

Re: Importing csv files into Hive ORC target table

2016-02-18 Thread Alex Dzhagriev

Hi Mich, Try to use a regexp to parse your string instead of the split. Thanks, Alex. On Thu, Feb 18, 2016 at 6:35 PM, Mich Talebzadeh < mich.talebza...@cloudtechnologypartners.co.uk> wrote: > > > thanks, > > > > I have an issue here. > > define rdd to rea

Re: Importing csv files into Hive ORC target table

2016-02-17 Thread Alex Dzhagriev

eDataFrame(resultRdd).write.orc("..path..") Please, note that resultRdd should contain Products (e.g. case classes) Cheers, Alex. On Wed, Feb 17, 2016 at 11:43 PM, Mich Talebzadeh < mich.talebza...@cloudtechnologypartners.co.uk> wrote: > Hi, > > We put csv files that a

cartesian with Dataset

2016-02-17 Thread Alex Dzhagriev

Hello all, Is anybody aware of any plans to support cartesian for Datasets? Are there any ways to work around this issue without switching to RDDs? Thanks, Alex.

Re: How to query a hive table from inside a map in Spark

2016-02-14 Thread Alex Kozlov

map-in-Spark-tp26224.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Alex Kozlov (408) 507-4987 (650) 887-2135 efax ale...@gmail.com

Re: Best practises of share Spark cluster over few applications

2016-02-14 Thread Alex Kozlov

>>> >>> Ideally I'd like Spark cores just be available in total and the first >>> app who needs it, takes as much as required from the available at the >>> moment. Is it possible? I believe Mesos is able to set resources free if >>> they're not in use. Is it possible with YARN? >>> >>> I'd appreciate if you could share your thoughts or experience on the >>> subject. >>> >>> Thanks. >>> -- >>> Be well! >>> Jean Morozov >>> >> -- Alex Kozlov ale...@gmail.com

Databricks Cloud vs AWS EMR

2016-01-26 Thread Alex Nastetsky

As a user of AWS EMR (running Spark and MapReduce), I am interested in potential benefits that I may gain from Databricks Cloud. I was wondering if anyone has used both and done comparison / contrast between the two services. In general, which resource manager(s) does Databricks Cloud use for

RE: Date / time stuff with spark.

2016-01-22 Thread Spencer, Alex (Santander)

dateFormat = format.DateTimeFormat.forPattern("-MM-dd"); val tranDate = dateFormat.parseDateTime(someDateString) Alex -Original Message- From: Andrew Holway [mailto:andrew.hol...@otternetworks.de] Sent: 21 January 2016 19:25 To: user@spark.apache.org Subject: Date /

RE: Spark SQL . How to enlarge output rows ?

2016-01-21 Thread Spencer, Alex (Santander)

I forgot to add this is (I think) from 1.5.0. And yeah that looks like a Python – I’m not hot with Python but it may be capitalised as False or FALSE? From: Eli Super [mailto:eli.su...@gmail.com] Sent: 21 January 2016 14:48 To: Spencer, Alex (Santander) Cc: user@spark.apache.org Subject: Re

RE: NPE when using Joda DateTime

2016-01-15 Thread Spencer, Alex (Santander)

I'll try the hackier way for now - given the limitation of not being able to modify the environment we've been given. Thanks all for your help so far. Kind Regards, Alex. -Original Message- From: Sean Owen [mailto:so...@cloudera.com] Sent: 15 January 2016 12:17 To: Spencer, Alex

1 2 3 >

1 - 100 of 202 matches

Mail list logo