How is the order ensured in the jdbc relation provider when inserting data from multiple executors

2016-11-21 Thread Niranda Perera
core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L277 -- Niranda Perera @n1r44 <https://twitter.com/N1R44> +94 71 554 8430 https://www.linkedin.com/in/niranda https://pythagoreanscript.wordpress.com/

SQL Syntax for pivots

2016-11-16 Thread Niranda Perera
) * *reshape2 (R) - dcast(df, A + B ~ C, sum) * *Oracle 11g - SELECT * FROM df PIVOT (sum(D) FOR C IN ('small', 'large')) p* Best [1] http://www.slideshare.net/SparkSummit/pivoting-data-with-sparksql-by-andrew-ray -- Niranda Perera @n1r44 <https://twitter.com/N1R44> +94 71 554 8430

Executors go OOM when using JDBC relation provider

2016-08-16 Thread Niranda Perera
. This OOM exception is actually a blocker! Are there any other tuning I should do? And it certainly worries me to see that MySQL gave a significantly fast result than Spark here! Look forward to hearing from you! Best -- Niranda Perera @n1r44 <https://twitter.com/N1R44> +94 71 554 8430

Why isnt spark-yarn module is excluded from the spark parent pom?

2016-07-12 Thread Niranda Perera
Hi guys, I could not find the spark-yarn module in the spark parent pom? Is there any particular reason why this has been excluded? Best -- Niranda @n1r44 +94-71-554-8430 https://pythagoreanscript.wordpress.com/

Re: Latest spark release in the 1.4 branch

2016-07-07 Thread Niranda Perera
ere Spark is > now. > > On Wed, Jul 6, 2016 at 11:12 PM, Niranda Perera <niranda.per...@gmail.com> > wrote: > >> Thanks Reynold >> >> On Thu, Jul 7, 2016 at 11:40 AM, Reynold Xin <r...@databricks.com> wrote: >> >>> Yes definitely.

Re: Latest spark release in the 1.4 branch

2016-07-07 Thread Niranda Perera
Thanks Reynold On Thu, Jul 7, 2016 at 11:40 AM, Reynold Xin <r...@databricks.com> wrote: > Yes definitely. > > > On Wed, Jul 6, 2016 at 11:08 PM, Niranda Perera <niranda.per...@gmail.com> > wrote: > >> Thanks Reynold for the prompt response. Do you think we co

Re: Latest spark release in the 1.4 branch

2016-07-07 Thread Niranda Perera
ts no longer work with branch-1.4. You can build from the > branch yourself, but it might be better to upgrade to the later versions. > > On Wed, Jul 6, 2016 at 11:02 PM, Niranda Perera <niranda.per...@gmail.com> > wrote: > >> Hi guys, >> >> May I know if you

Latest spark release in the 1.4 branch

2016-07-07 Thread Niranda Perera
Hi guys, May I know if you have halted development in the Spark 1.4 branch? I see that there is a release tag for 1.4.2 but it was never released. Can we expect a 1.4.x bug fixing release anytime soon? Best -- Niranda @n1r44 +94-71-554-8430

Re: Possible deadlock in registering applications in the recovery mode

2016-04-22 Thread Niranda Perera
Hi guys, any update on this? Best On Wed, Apr 20, 2016 at 3:00 AM, Niranda Perera <niranda.per...@gmail.com> wrote: > Hi Reynold, > > I have created a JIRA for this [1]. I have also created a PR for the same > issue [2]. > > Would be very grateful if you could

Re: Possible deadlock in registering applications in the recovery mode

2016-04-19 Thread Niranda Perera
/browse/SPARK-14736 [2] https://github.com/apache/spark/pull/12506 On Mon, Apr 18, 2016 at 9:02 AM, Reynold Xin <r...@databricks.com> wrote: > I haven't looked closely at this, but I think your proposal makes sense. > > > On Sun, Apr 17, 2016 at 6:40 PM, Niranda Perera <nir

Re: Possible deadlock in registering applications in the recovery mode

2016-04-17 Thread Niranda Perera
Hi guys, Any update on this? Best On Tue, Apr 12, 2016 at 12:46 PM, Niranda Perera <niranda.per...@gmail.com> wrote: > Hi all, > > I have encountered a small issue in the standalone recovery mode. > > Let's say there was an application A running in the cluster. Due to som

Possible deadlock in registering applications in the recovery mode

2016-04-12 Thread Niranda Perera
Hi all, I have encountered a small issue in the standalone recovery mode. Let's say there was an application A running in the cluster. Due to some issue, the entire cluster, together with the application A goes down. Then later on, cluster comes back online, and the master then goes into the

Control the stdout and stderr streams in a executor JVM

2016-02-28 Thread Niranda Perera
Hi all, Is there any possibility to control the stdout and stderr streams in an executor JVM? I understand that there are some configurations provided from the spark conf as follows spark.executor.logs.rolling.maxRetainedFiles spark.executor.logs.rolling.maxSize

Re: spark job scheduling

2016-01-27 Thread Niranda Perera
esign decision. > > Best, > > Chayapan (A) > > On Thu, Jan 28, 2016 at 10:07 AM, Niranda Perera <niranda.per...@gmail.com > > wrote: > >> hi all, >> >> I have a few questions on spark job scheduling. >> >> 1. As I understand, the smallest un

spark job scheduling

2016-01-27 Thread Niranda Perera
hi all, I have a few questions on spark job scheduling. 1. As I understand, the smallest unit of work an executor can perform. In the 'fair' scheduler mode, let's say a job is submitted to the spark ctx which has a considerable amount of work to do in a task. While such a 'big' task is running,

taking the heap dump when an executor goes OOM

2015-10-11 Thread Niranda Perera
Hi all, is there a way for me to get the heap-dump hprof of an executor jvm, when it goes out of memory? is this currently supported or do I have to change some configurations? cheers -- Niranda @n1r44 +94-71-554-8430 https://pythagoreanscript.wordpress.com/

passing a AbstractFunction1 to sparkContext().runJob instead of a Closure

2015-10-09 Thread Niranda Perera
hi all, I want to run a job in the spark context and since I am running the system in the java environment, I can not use a closure in the sparkContext().runJob. Instead, I am passing an AbstractFunction1 extension. while I get the jobs run without an issue, I constantly get the following WARN

adding jars to the classpath with the relative path to spark home

2015-09-08 Thread Niranda Perera
Hi, is it possible to add jars to the spark executor/ driver classpath with the relative path of the jar (relative to the spark home)? I need to set the following settings in the spark conf - spark.driver.extraClassPath - spark.executor.extraClassPath the reason why I need to use the relative

Re: taking an n number of rows from and RDD starting from an index

2015-09-02 Thread Niranda Perera
t;> I think rdd.toLocalIterator is what you want. But it will keep one >> partition's data in-memory. >> >> On Wed, Sep 2, 2015 at 10:05 AM, Niranda Perera <niranda.per...@gmail.com >> > wrote: >> >>> Hi all, >>> >>> I have a large

Spark SQL sort by and collect by in multiple partitions

2015-09-02 Thread Niranda Perera
Hi all, I have been using sort by and order by in spark sql and I observed the following when using SORT BY and collect results, the results are getting sorted partition by partition. example: if we have 1, 2, ... , 12 and 4 partitions and I want to sort it in descending order, partition 0 (p0)

taking an n number of rows from and RDD starting from an index

2015-09-01 Thread Niranda Perera
Hi all, I have a large set of data which would not fit into the memory. So, I wan to take n number of data from the RDD given a particular index. for an example, take 1000 rows starting from the index 1001. I see that there is a take(num: Int): Array[T] method in the RDD, but it only returns

dynamically update the master list of a worker or a spark context

2015-07-27 Thread Niranda Perera
Hi all, I have been developing a custom recovery implementation for spark masters and workers using hazlecast clustering. in the Spark worker code [1], we see that a list of masters needs to be provided at the worker start up, in order to achieve high availability. this effectively means that

databases currently supported by Spark SQL JDBC

2015-07-09 Thread Niranda Perera
Hi, I'm planning to use Spark SQL JDBC datasource provider in various RDBMS databases. what are the databases currently supported by Spark JDBC relation provider? rgds -- Niranda @n1r44 https://twitter.com/N1R44 https://pythagoreanscript.wordpress.com/

Re: Error in invoking a custom StandaloneRecoveryModeFactory in java env (Spark v1.3.0)

2015-07-05 Thread Niranda Perera
be the reason for this? rgds On Thu, Jun 25, 2015 at 11:42 AM, Niranda Perera niranda.per...@gmail.com wrote: thanks Josh. this looks very similar to my problem. On Thu, Jun 25, 2015 at 11:32 AM, Josh Rosen rosenvi...@gmail.com wrote: This sounds like https://issues.apache.org/jira/browse

Re: Error in invoking a custom StandaloneRecoveryModeFactory in java env (Spark v1.3.0)

2015-07-05 Thread Niranda Perera
Hi, Sorry this was a class loading issue at my side. Sorted it out. Sorry if I caused any inconvenience Rgds Niranda Perera +94 71 554 8430 On Jul 5, 2015 17:08, Niranda Perera niranda.per...@gmail.com wrote: Hi Josh, I tried using the spark 1.4.0 upgrade. here is the class I'm trying

Re: Error in invoking a custom StandaloneRecoveryModeFactory in java env (Spark v1.3.0)

2015-06-25 Thread Niranda Perera
:57 PM, Niranda Perera niranda.per...@gmail.com wrote: Hi all, I'm trying to implement a custom StandaloneRecoveryModeFactory in the Java environment. Pls find the implementation here. [1] . I'm new to Scala, hence I'm trying to use Java environment as much as possible. when I start

Error in invoking a custom StandaloneRecoveryModeFactory in java env (Spark v1.3.0)

2015-06-24 Thread Niranda Perera
Hi all, I'm trying to implement a custom StandaloneRecoveryModeFactory in the Java environment. Pls find the implementation here. [1] . I'm new to Scala, hence I'm trying to use Java environment as much as possible. when I start a master with spark.deploy.recoveryMode.factory property to be

custom REST port from spark-defaults.cof

2015-06-23 Thread Niranda Perera
Hi, is there a configuration setting to set a custom port number for the master REST URL? can that be included in the spark-defaults.conf? cheers -- Niranda @n1r44 https://twitter.com/N1R44 https://pythagoreanscript.wordpress.com/

Re: Tentative due dates for Spark 1.3.2 release

2015-05-17 Thread Niranda Perera
Hi Reynold, sorry, my mistake. can do that. thanks On Mon, May 18, 2015 at 9:51 AM, Reynold Xin r...@databricks.com wrote: You can just look at this branch, can't you? https://github.com/apache/spark/tree/branch-1.3 On Sun, May 17, 2015 at 9:20 PM, Niranda Perera niranda.per...@gmail.com

Re: Tentative due dates for Spark 1.3.2 release

2015-05-17 Thread Niranda Perera
introduce new API's. If you have a particular bug fix you are waiting for, you can always build Spark off of that branch. - Patrick On Fri, May 15, 2015 at 12:46 AM, Niranda Perera niranda.per...@gmail.com wrote: Hi, May I know the tentative release dates for spark 1.3.2? rgds

Tentative due dates for Spark 1.3.2 release

2015-05-15 Thread Niranda Perera
Hi, May I know the tentative release dates for spark 1.3.2? rgds -- Niranda

Re: Custom PersistanceEngine and LeaderAgent implementation in Java

2015-05-01 Thread Niranda Perera
29, 2015 at 11:02 PM, Niranda Perera niranda.per...@gmail.com wrote: Hi, this follows the following feature in this feature [1] I'm trying to implement a custom persistence engine and a leader agent in the Java environment. vis-a-vis scala, when I implement the PersistenceEngine trait

Custom PersistanceEngine and LeaderAgent implementation in Java

2015-04-30 Thread Niranda Perera
Hi, this follows the following feature in this feature [1] I'm trying to implement a custom persistence engine and a leader agent in the Java environment. vis-a-vis scala, when I implement the PersistenceEngine trait in java, I would have to implement methods such as readPersistedData,

Migrating from 1.2.1 to 1.3.0 - org.apache.spark.sql.api.java.Row

2015-04-01 Thread Niranda Perera
Hi, previously in 1.2.1, the result row from a Spark SQL query was a org.apache.spark.sql.api.java.Row. In 1.3.0 I do not see a sql.api.java package. so does it mean that even the SQL query result row is an implementation of org.apache.spark.sql.Row such as GenericRow etc? -- Niranda

Connecting a worker to the master after a spark context is made

2015-03-20 Thread Niranda Perera
Hi, Please consider the following scenario. I've started the spark master by invoking the org.apache.spark.deploy.master.Master.startSystemAndActor method in a java code and connected a worker to it using the org.apache.spark.deploy.worker.Worker.startSystemAndActor method. and then I have

Re: Fixed worker ports in the spark worker

2015-03-18 Thread Niranda Perera
, 2015 at 11:10 AM, Niranda Perera niranda.per...@gmail.com wrote: Hi all, I see that spark server opens up random ports, especially in the workers. is there any way to fix these ports or give an set of ports for the worker to choose from? cheers -- Niranda -- [image: Sigmoid

Fixed worker ports in the spark worker

2015-03-17 Thread Niranda Perera
Hi all, I see that spark server opens up random ports, especially in the workers. is there any way to fix these ports or give an set of ports for the worker to choose from? cheers -- Niranda

Deploying master and worker programatically in java

2015-03-03 Thread Niranda Perera
Hi, I want to start a Spark standalone cluster programatically in java. I have been checking these classes, - org.apache.spark.deploy.master.Master - org.apache.spark.deploy.worker.Worker I successfully started a master with this simple main class. public static void main(String[] args) {

OSGI bundles for spark project..

2015-02-20 Thread Niranda Perera
Hi, I am interested in a Spark OSGI bundle. While checking the maven repository I found out that it is still not being implemented. Can we see an OSGI bundle being released soon? Is it in the Spark Project roadmap? Rgds -- Niranda

Re: OSGI bundles for spark project..

2015-02-20 Thread Niranda Perera
and is not generally embeddable. Packaging is generally 'out of scope' for the core project beyond the standard Maven and assembly releases. On Fri, Feb 20, 2015 at 8:33 AM, Niranda Perera niranda.per...@gmail.com wrote: Hi, I am interested in a Spark OSGI bundle. While checking the maven

Re: Replacing Jetty with TomCat

2015-02-19 Thread Niranda Perera
the web server configurable. Mostly because there's no real problem in running an HTTP service internally based on Netty while you run your own HTTP service based on something else like Tomcat. What's the problem? On Wed, Feb 18, 2015 at 3:14 AM, Niranda Perera niranda.per...@gmail.com wrote: Hi

Re: Replacing Jetty with TomCat

2015-02-17 Thread Niranda Perera
be able to switch it out without rewriting a fair bit of code, no, but you don't need to. On Mon, Feb 16, 2015 at 5:08 AM, Niranda Perera niranda.per...@gmail.com wrote: Hi, We are thinking of integrating Spark server inside a product. Our current product uses Tomcat as its webserver

Re: Replacing Jetty with TomCat

2015-02-15 Thread Niranda Perera
are using the embedded mode of Jetty, rather than using servlets. Even if it is possible, you probably wouldn't want to embed Spark in your application server ... On Sun, Feb 15, 2015 at 9:08 PM, Niranda Perera niranda.per...@gmail.com wrote: Hi, We are thinking of integrating Spark server

Replacing Jetty with TomCat

2015-02-15 Thread Niranda Perera
Hi, We are thinking of integrating Spark server inside a product. Our current product uses Tomcat as its webserver. Is it possible to switch the Jetty webserver in Spark to Tomcat off-the-shelf? Cheers -- Niranda

create a SchemaRDD from a custom datasource

2015-01-13 Thread Niranda Perera
Hi, We have a custom datasources API, which connects to various data sources and exposes them out as a common API. We are now trying to implement the Spark datasources API released in 1.2.0 to connect Spark for analytics. Looking at the sources API, we figured out that we should extend a scan

Guava 11 dependency issue in Spark 1.2.0

2015-01-06 Thread Niranda Perera
: com.google.common.hash.HashFunction.hashInt error occurs, which is understandable because hashInt is not available before Guava 12. So, I''m wondering why this occurs? Cheers -- Niranda Perera

Re: Can the Scala classes in the spark source code, be inherited in Java classes?

2014-12-02 Thread Niranda Perera
are compiled down to classes in bytecode. Take a look at this: https://twitter.github.io/scala_school/java.html Note that questions like this are not exactly what this dev list is meant for ... On Mon, Dec 1, 2014 at 9:22 PM, Niranda Perera nira...@wso2.com wrote: Hi, Can the Scala classes

Re: Creating a SchemaRDD from an existing API

2014-12-01 Thread Niranda Perera
library for reading Avro data https://github.com/databricks/spark-avro. On Thu, Nov 27, 2014 at 10:31 PM, Niranda Perera nira...@wso2.com wrote: Hi, I am evaluating Spark for an analytic component where we do batch processing of data using SQL. So, I am particularly interested in Spark

Can the Scala classes in the spark source code, be inherited in Java classes?

2014-12-01 Thread Niranda Perera
Hi, Can the Scala classes in the spark source code, be inherited (and other OOP concepts) in Java classes? I want to customize some part of the code, but I would like to do it in a Java environment. Rgds -- *Niranda Perera* Software Engineer, WSO2 Inc. Mobile: +94-71-554-8430 Twitter: @n1r44

Creating a SchemaRDD from an existing API

2014-11-27 Thread Niranda Perera
[1] https://github.com/wso2-dev/carbon-analytics/tree/master/components/xanalytics -- *Niranda Perera* Software Engineer, WSO2 Inc. Mobile: +94-71-554-8430 Twitter: @n1r44 https://twitter.com/N1R44

Getting the execution times of spark job

2014-09-02 Thread Niranda Perera
-- *Niranda Perera* Software Engineer, WSO2 Inc. Mobile: +94-71-554-8430 Twitter: @n1r44 https://twitter.com/N1R44

Storage Handlers in Spark SQL

2014-08-21 Thread Niranda Perera
+for+Hive https://docs.wso2.com/display/BAM241/Creating+Hive+Queries+to+Analyze+Data#CreatingHiveQueriestoAnalyzeData-cas I would like to know where Spark SQL can work with these storage handlers (while using HiveContext may be) ? Best regards -- *Niranda Perera* Software Engineer, WSO2 Inc. Mobile