Re: Write Spark Connection client application in Go

2023-09-13 Thread Martin Grund
te().Mode("overwrite"). >> Format("parquet"). >> Save("file:///tmp/spark-connect-write-example-output.parquet") >> >> df = spark.Read().Format("parquet"). >> Load("file:///tmp/spar

[Feature Request] make unix_micros() and unix_millis() available in PySpark (pyspark.sql.functions)

2022-10-14 Thread Martin
available in PySpark. Cheers, Martin

Re: Moving to Spark 3x from Spark2

2022-09-01 Thread Martin Andersson
You should check the release notes and upgrade instructions. From: rajat kumar Sent: Thursday, September 1, 2022 12:44 To: user @spark Subject: Moving to Spark 3x from Spark2 EXTERNAL SENDER. Do not click links or open attachments unless you recognize the

Question regarding checkpointing with kafka structured streaming

2022-08-22 Thread Martin Andersson
I was looking around for some documentation regarding how checkpointing (or rather, delivery semantics) is done when consuming from kafka with structured streaming and I stumbled across this old documentation (that still somehow exists in latest versions) at

unsubscribe

2022-08-01 Thread Martin Soch
- To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Issues getting Apache Spark

2022-05-26 Thread Martin, Michael
to work on my laptop. Michael Martin

Re: Spark on K8s - repeating annoying exception

2022-05-13 Thread Martin Grigorov
Hi, On Mon, May 9, 2022 at 5:57 PM Shay Elbaz wrote: > Hi all, > > > > I apologize for reposting this from Stack Overflow, but it got very little > attention and now comment. > > > > I'm using Spark 3.2.1 image that was built from the official distribution > via `docker-image-tool.sh', on

Idea for improving performance when reading from hive-like partition folders and specifying a filter [Spark 3.2]

2022-05-01 Thread Martin
rations* (root: 1, year: 3, month: 36, day: 3,240, hour: 8,398,080) For df2 that's 3 list operations; but I did 4 more list operations upfront in oder do come up with the relevant_folders list; so *7 list operations* in total (root: 1, year: 1, month: 1, day: 1, hour: 3) Cheers Martin

Re: Spark on K8s , some applications ended ungracefully

2022-04-01 Thread Martin Grigorov
Hi, On Thu, Mar 31, 2022 at 4:18 PM Pralabh Kumar wrote: > Hi Spark Team > > Some of my spark applications on K8s ended with the below error . These > applications though completed successfully (as per the event log > SparkListenerApplicationEnd event at the end) > stil have even files with

Re: spark distribution build fails

2022-03-17 Thread Martin Grigorov
Hi, For the mail archives: this error happens when the user has MAVEN_OPTS env var pre-exported. In this case ./build/mvn|sbt does not export its own MAVEN_OPTS with the -XssXYZ value, and the default one is too low and leads to the StackOverflowError On Mon, Mar 14, 2022 at 11:13 PM

Re: Encoders.STRING() causing performance problems in Java application

2022-02-21 Thread martin
string at a time by the self-built prediction pipeline (which is also using other ML techniques apart from Spark). Needs some re-factoring... Thanks again for the help. Cheers, Martin Am 2022-02-18 13:41, schrieb Sean Owen: That doesn't make a lot of sense. Are you profiling the driver, rather

Re: Encoders.STRING() causing performance problems in Java application

2022-02-18 Thread martin
calls instead of running them line by line on the input file. Cheers, Martin Am 2022-02-18 09:41, schrieb mar...@wunderlich.com: I have been able to partially fix this issue by creating a static final field (i.e. a constant) for Encoders.STRING(). This removes the bottleneck associated

Re: Encoders.STRING() causing performance problems in Java application

2022-02-18 Thread martin
optimize this? Cheers, Martin Am 2022-02-18 07:42, schrieb mar...@wunderlich.com: Hello, I am working on optimising the performance of a Java ML/NLP application based on Spark / SparkNLP. For prediction, I am applying a trained model on a Spark dataset which consists of one column with only one

Encoders.STRING() causing performance problems in Java application

2022-02-17 Thread martin
prediction method call. So, is there a simpler and more efficient way of creating the required dataset, consisting of one column and one String row? Thanks a lot. Cheers, Martin

Re: how can I remove the warning message

2022-02-04 Thread Martin Grigorov
Hi, This is a JVM warning, as Sean explained. You cannot control it via loggers. You can disable it by passing --illegal-access=permit to java. Read more about it at https://softwaregarden.dev/en/posts/new-java/illegal-access-in-java-16/ On Sun, Jan 30, 2022 at 4:32 PM Sean Owen wrote: > This

Re: Spark 3.1 Json4s-native jar compatibility

2022-02-04 Thread Martin Grigorov
Hi, Amit said that he uses Spark 3.1, so the link should be https://github.com/apache/spark/blob/branch-3.1/pom.xml#L879 (3.7.0-M5) @Amit: check your classpath. Maybe there are more jars of this dependency. On Thu, Feb 3, 2022 at 10:53 PM Sean Owen wrote: > You can look it up: >

Re: [EXTERNAL] Fwd: Log4j upgrade in spark binary from 1.2.17 to 2.17.1

2022-01-31 Thread Martin Grigorov
Hi, On Mon, Jan 31, 2022 at 7:57 PM KS, Rajabhupati wrote: > Thanks a lot Sean. One final question before I close the conversion how do > we know what are the features that will be added as part of spark 3.3 > version? > There will be release notes for 3.3 at linked at

Re: Log4j 1.2.17 spark CVE

2021-12-13 Thread Martin Wunderlich
://www.bsi.bund.de/EN/Home/home_node.html Cheers, Martin Am 13.12.21 um 17:02 schrieb Jörn Franke: Is it in any case appropriate to use log4j 1.x which is not maintained anymore and has other security vulnerabilities which won’t be fixed anymore ? Am 13.12.2021 um 06:06 schrieb Sean Owen :  Check

Re: [Spark] Does Spark support backward and forward compatibility?

2021-11-24 Thread Martin Wunderlich
to load a model built with Spark 2.4.4 after updating to 3.2.0. This didn't work. Cheers, Martin Am 24.11.21 um 20:18 schrieb Sean Owen: I think/hope that it goes without saying you can't mix Spark versions within a cluster. Forwards compatibility is something you don't generally expect

Re: EXT: Re: Create Dataframe from a single String in Java

2021-11-18 Thread martin
Thanks a lot, Sebastian and Vibhor. You're right, I can call the createDataset() also on the Spark session. Not sure how I missed that. Cheers, Martin Am 2021-11-18 12:01, schrieb Vibhor Gupta: You can try something like below. It creates a dataset and then converts it into a dataframe

Re: Create Dataframe from a single String in Java

2021-11-18 Thread martin
tions/44028677/how-to-create-a-dataframe-from-a-string Basically, what I am looking for is something simple like: Dataset myData = sparkSession.createDataFrame(textList, "text"); Any hints? Thanks a lot. Cheers, Martin

Create Dataframe from a single String in Java

2021-11-18 Thread martin
ate-a-dataframe-from-a-string Basically, what I am looking for is something simple like: Dataset myData = sparkSession.createDataFrame(textList, "text"); Any hints? Thanks a lot. Cheers, Martin

Re: Using MulticlassClassificationEvaluator for NER evaluation

2021-11-11 Thread martin
, we'll need to work around it and possible create a wrapper evaluator around the Spark standard class. Thanks a lot for the help. Cheers, Martin Am 2021-11-11 13:10, schrieb Gourav Sengupta: Hi Martin, okay, so you will ofcourse need to translate the NER string output to a numerical

Re: Using MulticlassClassificationEvaluator for NER evaluation

2021-11-11 Thread Martin Wunderlich
labels). Cheers, Martin Am 11.11.21 um 11:39 schrieb Gourav Sengupta: Hi Martin, just to confirm, you are taking the output of SPARKNLP, and then trying to feed it to SPARK ML for running algorithms on the output of NERgenerated by SPARKNLP right? Regards, Gourav Sengupta On Thu, Nov 11

Re: Feature (?): Setting custom parameters for a Spark MLlib pipeline

2021-11-11 Thread martin
-data? This could also be handy not just for things like versioning, but also for storing evaluation metrics together with a trained pipeline (for people who aren't using something like MLFlow, yet). Cheers, Martin Am 2021-10-25 14:38, schrieb Sean Owen: You can write a custom Transformer

Re: Using MulticlassClassificationEvaluator for NER evaluation

2021-11-10 Thread martin
have expected the MulticlassClassificationEvaluator to be able to use the labels directly. I will try to create and propose a code change in this regard, if or when I find the time. Cheers, Martin Am 2021-10-25 14:31, schrieb Sean Owen: I don't think the question is representation as double

Using MulticlassClassificationEvaluator for NER evaluation

2021-10-25 Thread martin
apply MulticlassClassificationEvaluator to the NER task or is there maybe a better evaluator? I haven't found anything yet (neither in Spark ML nor in SparkNLP). Thanks a lot. Cheers, Martin

Re: Feature (?): Setting custom parameters for a Spark MLlib pipeline

2021-10-25 Thread martin
this natively in Spark ML. Otherwise, I'll just create a wrapper class for the trained models. Cheers, Martin Am 2021-10-24 21:16, schrieb Sonal Goyal: Does MLFlow help you? https://mlflow.org/ I don't know if ML flow can save arbitrary key-value pairs and associate them with a model

Feature (?): Setting custom parameters for a Spark MLlib pipeline

2021-10-20 Thread martin
nity think about this proposal? Has it been discussed before perhaps? Any thoughts? Cheers, Martin

Re: Distributing a FlatMap across a Spark Cluster

2021-06-09 Thread Chris Martin
ote: > Yeah to test that I just set the group key to the ID in the record which > is a solr supplied UUID, which means effectively you end up with 4000 > groups now. > > On Wed, Jun 9, 2021 at 5:13 PM Chris Martin wrote: > >> One thing I would check is this line: >&g

Re: Distributing a FlatMap across a Spark Cluster

2021-06-09 Thread Chris Martin
One thing I would check is this line: val fetchedRdd = rdd.map(r => (r.getGroup, r)) how many distinct groups do you ended up with? If there's just one then I think you might see the behaviour you observe. Chris On Wed, Jun 9, 2021 at 4:17 PM Tom Barber wrote: > Also just to follow up on

GPU job in Spark 3

2021-04-09 Thread Martin Somers
see what might be failing behind the scenes, any suggestions? Thanks Martin -- M

Structured Streaming together with Cassandra Queries

2018-09-22 Thread Martin Engen
Hello, I have a case where I am continuously getting a bunch sensor-data which is being stored into a Cassandra table (through Kafka). Every week or so, I want to manually enter additional data into the system - and I want this to trigger some calculations merging the manual entered data, and

How to work around NoOffsetForPartitionException when using Spark Streaming

2018-06-01 Thread Martin Peng
quick way to fix the missing offset or work around this? Thanks, Martin 1/06/2018 17:11:02: ERROR:the type of error is org.apache.kafka.clients.consumer.NoOffsetForPartitionException: Undefined offset with no reset policy for partition: elasticsearchtopicrealtimereports-97 01/06/2018 17:11:02

Re: Structured Streaming, Reading and Updating a variable

2018-05-16 Thread Martin Engen
util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Any ideas about how to handle this error? Thanks, Martin Engen From: Lalwani, Jayesh <jayesh.lalw...@capitalone.com> Sent: Tu

Structured Streaming, Reading and Updating a variable

2018-05-15 Thread Martin Engen
Hello, I'm working with Structured Streaming, and I need a method of keeping a running average based on last 24hours of data. To help with this, I can use Exponential Smoothing, which means I really only need to store 1 value from a previous calculation into the new, and update this variable

Re: Spark Job crash due to File Not found when shuffle intermittently

2017-07-25 Thread Martin Peng
cool~ Thanks Kang! I will check and let you know. Sorry for delay as there is an urgent customer issue today. Best Martin 2017-07-24 22:15 GMT-07:00 周康 <zhoukang199...@gmail.com>: > * If the file exists but is a directory rather than a regular file, does > * not exist but canno

Re: Spark Job crash due to File Not found when shuffle intermittently

2017-07-24 Thread Martin Peng
Is there anyone at share me some lights about this issue? Thanks Martin 2017-07-21 18:58 GMT-07:00 Martin Peng <wei...@gmail.com>: > Hi, > > I have several Spark jobs including both batch job and Stream jobs to > process the system log and analyze them. We are using Kaf

Spark Job crash due to File Not found when shuffle intermittently

2017-07-21 Thread Martin Peng
exceptions randomly(either after several hours run or just run in 20 mins). Can anyone give me some suggestions about how to figure out the real root cause? (Looks like google result is not very useful...) Thanks, Martin 00:30:04,510 WARN - 17/07/22 00:30:04 WARN TaskSetManager: Lost task 60.0

The stability of Spark Stream Kafka 010

2017-06-29 Thread Martin Peng
-10-integration.html Thanks Martin

stratified sampling scales poorly

2016-12-19 Thread Martin Le
with different sampling fractions (I ran experiments on 4 nodes cluster )? Thank you, Martin

Re: How to use a custom filesystem provider?

2016-09-21 Thread Jean-Philippe Martin
> > There's a bit of confusion setting in here; the FileSystem implementations > spark uses are subclasses of org.apache.hadoop.fs.FileSystem; the nio > class with the same name is different. > grab the google cloud storage connector and put it on your classpath I was using the gs:// filesystem

How to use a custom filesystem provider?

2016-09-21 Thread Jean-Philippe Martin
The full source for my example is available on github <https://github.com/jean-philippe-martin/SparkRepro>. I'm using maven to depend on gcloud-java-nio <https://mvnrepository.com/artifact/com.google.cloud/gcloud-java-nio/0.2.5>, which provides a Java FileSystem for Google Cloud Stor

DCOS - s3

2016-08-21 Thread Martin Somers
I having trouble loading data from an s3 repo Currently DCOS is running spark 2 so I not sure if there is a modifcation to code with the upgrade my code atm looks like this sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "xxx") sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "xxx")

Unsubscribe

2016-08-16 Thread Martin Serrano
Sent from my Verizon Wireless 4G LTE DROID

UNSUBSCRIBE

2016-08-10 Thread Martin Somers
-- M

Unsubscribe.

2016-08-09 Thread Martin Somers
Unsubscribe. Thanks M

Re: sampling operation for DStream

2016-08-01 Thread Martin Le
How to do that? if I put the queue inside .transform operation, it doesn't work. On Mon, Aug 1, 2016 at 6:43 PM, Cody Koeninger <c...@koeninger.org> wrote: > Can you keep a queue per executor in memory? > > On Mon, Aug 1, 2016 at 11:24 AM, Martin Le <martin.leq...@gmail.com>

Re: sampling operation for DStream

2016-08-01 Thread Martin Le
y were > evenly balanced. > > But once you've read the messages, nothing's stopping you from > filtering most of them out before doing further processing. The > dstream .transform method will let you do any filtering / sampling you > could have done on an rdd. > > On

sampling operation for DStream

2016-07-29 Thread Martin Le
sampling operation as well? If not, could you please give me a suggestion how to implement it? Thanks, Martin

Re: libraryDependencies

2016-07-26 Thread Martin Somers
icks.com> wrote: > Also, you'll want all of the various spark versions to be the same. > > On Tue, Jul 26, 2016 at 12:34 PM, Michael Armbrust <mich...@databricks.com > > wrote: > >> If you are using %% (double) then you do not need _2.11. >> >> On Tue, Jul 26,

libraryDependencies

2016-07-26 Thread Martin Somers
my build file looks like libraryDependencies ++= Seq( // other dependencies here "org.apache.spark" %% "spark-core" % "1.6.2" % "provided", "org.apache.spark" %% "spark-mllib_2.11" % "1.6.0", "org.scalanlp" % "breeze_2.11" % "0.7",

sbt build under scala

2016-07-26 Thread Martin Somers
Just wondering Whats is the correct way of building a spark job using scala - are there any changes coming with spark v2 Ive been following this post http://www.infoobjects.com/spark-submit-with-sbt/ Then again Ive been mainly using docker locally what is decent container for submitting

SVD output within Spark

2016-07-21 Thread Martin Somers
just looking at a comparision between Matlab and Spark for svd with an input matrix N this is matlab code - yes very small matrix N = 2.5903 -0.04160.6023 -0.12362.55960.7629 0.0148 -0.06930.2490 U = -0.3706 -0.92840.0273 -0.92870.3708

Re: Spark streaming takes longer time to read json into dataframes

2016-07-16 Thread Martin Eden
Hi, I would just do a repartition on the initial direct DStream since otherwise each RDD in the stream has exactly as many partitions as you have partitions in the Kafka topic (in your case 1). Like that receiving is still done in only 1 thread but at least the processing further down is done in

SparkStreaming multiple output operations failure semantics / error propagation

2016-07-14 Thread Martin Eden
Hi, I have a Spark 1.6.2 streaming job with multiple output operations (jobs) doing idempotent changes in different repositories. The problem is that I want to somehow pass errors from one output operation to another such that in the current output operation I only update previously successful

Re: DataFrame versus Dataset creation and usage

2016-06-28 Thread Martin Serrano
:57 PM, Xinh Huynh wrote: Hi Martin, Since your schema is dynamic, how would you use Datasets? Would you know ahead of time the row type T in a Dataset[T]? One option is to start with DataFrames in the beginning of your data pipeline, figure out the field types, and then switch completely over

Re: DataFrame versus Dataset creation and usage

2016-06-24 Thread Martin Serrano
Indeed. But I'm dealing with 1.6 for now unfortunately. On 06/24/2016 02:30 PM, Ted Yu wrote: In Spark 2.0, Dataset and DataFrame are unified. Would this simplify your use case ? On Fri, Jun 24, 2016 at 7:27 AM, Martin Serrano <mar...@attivio.com<mailto:mar...@attivio.com>> wro

DataFrame versus Dataset creation and usage

2016-06-24 Thread Martin Serrano
around. Any advice would be appreciated. Thanks, Martin

How does Spark Streaming updateStateByKey or mapWithState scale with state size?

2016-06-23 Thread Martin Eden
Hi all, It is currently difficult to understand from the Spark docs or the materials online that I came across, how the updateStateByKey and mapWithState operators in Spark Streaming scale with the size of the state and how to reason about sizing the cluster appropriately. According to this

S3n performance (@AaronDavidson)

2016-04-12 Thread Martin Eden
available under DataBricksCloud? How can we benefit from those improvements? Thanks, Martin P.S. Have not tried S3a.

Re: Direct Kafka input stream and window(…) function

2016-03-29 Thread Martin Soch
as expected. Since this is possible use-case (API allows it) I would like to know whether I hit some limitation (or bug) in the Spark-Kafka. I am using Spark 1.5.0. Thanks Martin On 03/22/2016 06:24 PM, Cody Koeninger wrote: I definitely have direct stream jobs that use window() without pro

Direct Kafka input stream and window(…) function

2016-03-22 Thread Martin Soch
ies when using different type of stream. Is it some known limitation of window(..) function when used with direct-Kafka-input-stream ? Java pseudo code: org.apache.spark.streaming.kafka.DirectKafkaInputDStream s; s.window(Durations.seconds(10)).print(); // the pipeline will stop Thanks Mar

Fwd: Connection failure followed by bad shuffle files during shuffle

2016-03-15 Thread Eric Martin
the bug can be more easily investigated? Best, Eric Martin

Problem running JavaDirectKafkaWordCount

2016-03-12 Thread Martin Andreoni
lcome, Thanks. -- ----- MARTIN ANDREONI

Frustration over Spark and Jackson

2016-02-16 Thread Martin Skøtt
of Jackson, but that would require me to change my common code which I would like to avoid. -- Kind regards Martin

Sorry, but Nabble and ML suck

2015-10-31 Thread Martin Senne
Having written a post on last Tuesday, I'm still not able to see my post under nabble. And yeah, subscription to u...@apache.spark.org was successful (rechecked a minute ago) Even more, I have no way (and no confirmation) that my post was accepted, rejected, whatever. This is very L4M3 and so

Re: Sorry, but Nabble and ML suck

2015-10-31 Thread Martin Senne
to see get things improved ... Cheers, Martin Am 31.10.2015 17:34 schrieb "Nicholas Chammas" <nicholas.cham...@gmail.com>: > Nabble is an unofficial archive of this mailing list. I don't know who > runs it, but it's not Apache. There are often delays between when things &g

Re: Sorry, but Nabble and ML suck

2015-10-31 Thread Martin Senne
Ted, thx. Should I repost? Am 31.10.2015 17:41 schrieb "Ted Yu" <yuzhih...@gmail.com>: > From the result of http://search-hadoop.com/?q=spark+Martin+Senne , > Martin's post Tuesday didn't go through. > > FYI > > On Sat, Oct 31, 2015 at 9:34 AM, Nicholas Cham

Why does predicate pushdown not work on HiveContext (concrete HiveThriftServer2) ?

2015-10-31 Thread Martin Senne
Hi all, # Programm Sketch I create a HiveContext `hiveContext` With that context, I create a DataFrame `df` from a JDBC relational table.I register the DataFrame `df` viadf.registerTempTable("TESTTABLE")I start a HiveThriftServer2 via HiveThriftServer2.startWithContext(hiveContext) The

Why is no predicate pushdown performed, when using Hive (HiveThriftServer2) ?

2015-10-28 Thread Martin Senne
Hi all, # Programm Sketch 1. I create a HiveContext `hiveContext` 2. With that context, I create a DataFrame `df` from a JDBC relational table. 3. I register the DataFrame `df` via df.registerTempTable("TESTTABLE") 4. I start a HiveThriftServer2 via

Re: how to handle OOMError from groupByKey

2015-09-28 Thread Fabien Martin
You can try to reduce the number of containers in order to increase their memory. 2015-09-28 9:35 GMT+02:00 Akhil Das : > You can try to increase the number of partitions to get ride of the OOM > errors. Also try to use reduceByKey instead of groupByKey. > > Thanks >

Re: Small File to HDFS

2015-09-03 Thread Martin Menzel
luck Martin 2015-09-03 16:17 GMT+02:00 <nib...@free.fr>: > My main question in case of HAR usage is , is it possible to use Pig on it > and what about performances ? > > - Mail original - > De: "Jörn Franke" <jornfra...@gmail.com> > À: nib...

When will window ....

2015-08-10 Thread Martin Senne
When will window functions be integrated into Spark (without HiveContext?) Gesendet mit AquaMail für Android http://www.aqua-mail.com Am 10. August 2015 23:04:22 schrieb Michael Armbrust mich...@databricks.com: You will need to use a HiveContext for window functions to work. On Mon, Aug 10,

Re: Spark SQL DataFrame: Nullable column and filtering

2015-08-01 Thread Martin Senne
DataFrame.printSchema. (or at least I did not find a way of how to) Cheers, Martin 2015-07-31 22:51 GMT+02:00 Martin Senne martin.se...@googlemail.com: Dear Michael, dear all, a minimal example is listed below. After some further analysis I could figure out, that the problem is related

Re: Spark SQL DataFrame: Nullable column and filtering

2015-07-31 Thread Martin Senne
programmatically. Could someone sketch the code? Any help welcome, thanks Martin import org.apache.spark.sql.types.{StructField, StructType} import org.apache.spark.{SparkContext, SparkConf} import org.apache.spark.sql.{DataFrame, SQLContext} object

Re: Spark SQL DataFrame: Nullable column and filtering

2015-07-30 Thread Martin Senne
in (1, hello, null) as there is no counterpart in mapping (this is the left outer join part) I need to distinguish 1 and 2 because of later inserts (case 1, hello) or updates (case 2, bon). Cheers and thanks, Martin Am 30.07.2015 22:58 schrieb Michael Armbrust mich...@databricks.com: Perhaps I'm

Re: Spark SQL DataFrame: Nullable column and filtering

2015-07-30 Thread Martin Senne
, Martin 2015-07-30 20:23 GMT+02:00 Michael Armbrust mich...@databricks.com: We don't yet updated nullability information based on predicates as we don't actually leverage this information in many places yet. Why do you want to update the schema? On Thu, Jul 30, 2015 at 11:19 AM

Re: Spark ML Pipeline inaccessible types

2015-03-25 Thread zapletal-martin
, Martin -- Původní zpráva -- Od: Peter Rudenko petro.rude...@gmail.com Komu: zapletal-mar...@email.cz, Sean Owen so...@cloudera.com Datum: 25. 3. 2015 13:28:38 Předmět: Re: Spark ML Pipeline inaccessible types Hi Martin, here’s 2 possibilities to overcome this: 1) Put your

Re: Spark ML Pipeline inaccessible types

2015-03-25 Thread zapletal-martin
by the real problem I am facing. My issue is that VectorUDT is not accessible by user code and therefore it is not possible to use custom ML pipeline with the existing Predictors (see the last two paragraphs in my first email). Best Regards, Martin -- Původní zpráva -- Od

Spark ML Pipeline inaccessible types

2015-03-25 Thread zapletal-martin
not yet expected to be used in this way? Thanks, Martin

Re: DataFrame operation on parquet: GC overhead limit exceeded

2015-03-23 Thread Martin Goodson
Have you tried to repartition() your original data to make more partitions before you aggregate? -- Martin Goodson | VP Data Science (0)20 3397 1240 [image: Inline image 1] On Mon, Mar 23, 2015 at 4:12 PM, Yiannis Gkoufas johngou...@gmail.com wrote: Hi Yin, Yes, I have set

Solving linear equations

2014-10-22 Thread Martin Enzinger
Hi, I'm wondering how to use Mllib for solving equation systems following this pattern 2*x1 + x2 + 3*x3 + + xn = 0 x1 + 0*x2 + 3*x3 + + xn = 0 .. .. 0*x1 + x2 + 0*x3 + + xn = 0 I definitely still have some reading to do to really understand the direct solving

Re: Personalized Page rank in graphx

2014-08-21 Thread Martin Liesenberg
I could take a stab at it, though I'd have some reading up on Personalized PageRank to do, before I'd be able to start coding. If that's OK, I'd get started. Best regards, Martin On 20 August 2014 23:03, Ankur Dave ankurd...@gmail.com wrote: At 2014-08-20 10:57:57 -0700, Mohit Singh mohit1

Job using Spark for Machine Learning

2014-07-29 Thread Martin Goodson
per month and are second only to Google in the contextual advertising space (ok - a distant second!). Details here: *http://grnh.se/rl8f25 http://grnh.se/rl8f25* -- Martin Goodson | VP Data Science (0)20 3397 1240 [image: Inline image 1]

Re: Configuring Spark Memory

2014-07-24 Thread Martin Goodson
Thank you Nishkam, I have read your code. So, for the sake of my understanding, it seems that for each spark context there is one executor per node? Can anyone confirm this? -- Martin Goodson | VP Data Science (0)20 3397 1240 [image: Inline image 1] On Thu, Jul 24, 2014 at 6:12 AM, Nishkam

Re: Configuring Spark Memory

2014-07-24 Thread Martin Goodson
Great - thanks for the clarification Aaron. The offer stands for me to write some documentation and an example that covers this without leaving *any* room for ambiguity. -- Martin Goodson | VP Data Science (0)20 3397 1240 [image: Inline image 1] On Thu, Jul 24, 2014 at 6:09 PM, Aaron

Configuring Spark Memory

2014-07-23 Thread Martin Goodson
available (daemon memory, worker memory etc). Perhaps a worked example could be added to the docs? I would be happy to provide some text as soon as someone can enlighten me on the technicalities! Thank you -- Martin Goodson | VP Data Science (0)20 3397 1240 [image: Inline image 1]

Re: Configuring Spark Memory

2014-07-23 Thread Martin Goodson
by Spark.) Am I reading this incorrectly? Anyway our configuration is 21 machines (one master and 20 slaves) each with 60Gb. We would like to use 4 cores per machine. This is pyspark so we want to leave say 16Gb on each machine for python processes. Thanks again for the advice! -- Martin Goodson

Re: Problem running Spark shell (1.0.0) on EMR

2014-07-22 Thread Martin Goodson
I am also having exactly the same problem, calling using pyspark. Has anyone managed to get this script to work? -- Martin Goodson | VP Data Science (0)20 3397 1240 [image: Inline image 1] On Wed, Jul 16, 2014 at 2:10 PM, Ian Wilkinson ia...@me.com wrote: Hi, I’m trying to run the Spark

Re: TreeNodeException: No function to evaluate expression. type: AttributeReference, tree: id#0 on GROUP BY

2014-07-21 Thread Martin Gammelsæter
in the select clause unless they are also in the group by clause or are inside of an aggregate function. On Jul 18, 2014 5:12 AM, Martin Gammelsæter martingammelsae...@gmail.com wrote: Hi again! I am having problems when using GROUP BY on both SQLContext and HiveContext (same problem). My code

TreeNodeException: No function to evaluate expression. type: AttributeReference, tree: id#0 on GROUP BY

2014-07-18 Thread Martin Gammelsæter
) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:745) What am I doing wrong? -- Best regards, Martin Gammelsæter

Re: Supported SQL syntax in Spark SQL

2014-07-14 Thread Martin Gammelsæter
? I launched the cluster using spark-ec2 from the 1.0.1 release, so I’m assuming that’s taken care of, at least in theory. I just spun down the clusters I had up, but I will revisit this tomorrow and provide the information you requested. Nick -- Mvh. Martin Gammelsæter 92209139

Re: Disabling SparkContext WebUI on port 4040, accessing information programatically?

2014-07-09 Thread Martin Gammelsæter
8, 2014 at 8:43 AM, Koert Kuipers ko...@tresata.com wrote: do you control your cluster and spark deployment? if so, you can try to rebuild with jetty 9.x On Tue, Jul 8, 2014 at 9:39 AM, Martin Gammelsæter martingammelsae...@gmail.com wrote: Digging a bit more I see that there is yet another

Initial job has not accepted any resources means many things

2014-07-09 Thread Martin Gammelsæter
starts up, and instead manually add the jar to the classpath of every worke), but I can't seem to find out how) -- Best regards, Martin Gammelsæter

Re: Disabling SparkContext WebUI on port 4040, accessing information programatically?

2014-07-08 Thread Martin Gammelsæter
on how to solve this? Spark seems to use jetty 8.1.14, while dropwizard uses jetty 9.0.7, so that might be the source of the problem. Any ideas? On Tue, Jul 8, 2014 at 2:58 PM, Martin Gammelsæter martingammelsae...@gmail.com wrote: Hi! I am building a web frontend for a Spark app, allowing users

Re: Spark SQL user defined functions

2014-07-07 Thread Martin Gammelsæter
for clarity, is LocalHiveContext and HiveContext equal if no hive-site.xml is present, or are there still differences? -- Best regards, Martin Gammelsæter

Re: How to use groupByKey and CqlPagingInputFormat

2014-07-05 Thread Martin Gammelsæter
spark driver? Mohammed -Original Message- From: Martin Gammelsæter [mailto:martingammelsae...@gmail.com] Sent: Friday, July 4, 2014 12:43 AM To: user@spark.apache.org Subject: Re: How to use groupByKey and CqlPagingInputFormat On Thu, Jul 3, 2014 at 10:29 PM, Mohammed Guller moham

Spark SQL user defined functions

2014-07-04 Thread Martin Gammelsæter
= sparkCtx.cassandraTable(ks, cf) // registerAsTable etc val res = sql(SELECT id, xmlGetTag(xmlfield, 'sometag') FROM cf) -- Best regards, Martin Gammelsæter

Re: Spark SQL user defined functions

2014-07-04 Thread Martin Gammelsæter
. case class) to SchemaRDD by SQLContext. Load data from Cassandra into RDD of case class, convert it to SchemaRDD and register it, then you can use it in your SQLs. http://spark.apache.org/docs/latest/sql-programming-guide.html#running-sql-on-rdds Thanks. 2014-07-04 17:59 GMT+09:00 Martin

  1   2   >