RE: Spark Integration Patterns

2016-02-29 Thread skaarthik oss
Check out http://toree.incubator.apache.org/. It might help with your need.

 

From: moshir mikael [mailto:moshir.mik...@gmail.com] 
Sent: Monday, February 29, 2016 5:58 AM
To: Alex Dzhagriev 
Cc: user 
Subject: Re: Spark Integration Patterns

 

Thanks, will check too, however : just want to use Spark core RDD and standard 
data sources.

 

Le lun. 29 févr. 2016 à 14:54, Alex Dzhagriev  > a écrit :

Hi Moshir,

 

Regarding the streaming, you can take a look at the spark streaming, the 
micro-batching framework. If it satisfies your needs it has a bunch of 
integrations. Thus, the source for the jobs could be Kafka, Flume or Akka.

 

Cheers, Alex.

 

On Mon, Feb 29, 2016 at 2:48 PM, moshir mikael  > wrote:

Hi Alex,

thanks for the link. Will check it.

Does someone know of a more streamlined approach ?

 

 

 

Le lun. 29 févr. 2016 à 10:28, Alex Dzhagriev  > a écrit :

Hi Moshir,

 

I think you can use the rest api provided with Spark: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala

 

Unfortunately, I haven't find any documentation, but it looks fine.

Thanks, Alex.

 

On Sun, Feb 28, 2016 at 3:25 PM, mms  > wrote:

Hi, I cannot find a simple example showing how a typical application can 
'connect' to a remote spark cluster and interact with it. Let's say I have a 
Python web application hosted somewhere outside a spark cluster, with just 
python installed on it. How can I talk to Spark without using a notebook, or 
using ssh to connect to a cluster master node ? I know of spark-submit and 
spark-shell, however forking a process on a remote host to execute a shell 
script seems like a lot of effort What are the recommended ways to connect and 
query Spark from a remote client ? Thanks Thx ! 

  _  

View this message in context: Spark Integration Patterns 

 
Sent from the Apache Spark User List mailing list archive 
  at Nabble.com.

 

 



RE: Spark with .NET

2016-02-09 Thread skaarthik oss
Arko – you could use the following links to get started with SparkCLR API and 
use C# with Spark for DataFrame processing. If you need the support for 
interactive scenario, please feel free to share your scenario and requirements 
to the SparkCLR project. Interactive scenario is one of the focus areas of the 
current milestone in SparkCLR project.

· 
https://github.com/Microsoft/SparkCLR/blob/master/examples/JdbcDataFrame/Program.cs

· 
https://github.com/Microsoft/SparkCLR/blob/master/csharp/Samples/Microsoft.Spark.CSharp/DataFrameSamples.cs

 

 

Ted – System.Data.DataSetExtensions is a reference that is automatically added 
when a C# project is created in Visual Studio. As Silvio pointed out below, it 
is a .NET assembly and not really used by SparkCLR.

 

From: Silvio Fiorito [mailto:silvio.fior...@granturing.com] 
Sent: Tuesday, February 9, 2016 1:31 PM
To: Ted Yu ; Bryan Jeffrey 
Cc: Arko Provo Mukherjee ; user 

Subject: Re: Spark with .NET

 

That’s just a .NET assembly (not related to Spark DataSets) but doesn’t look 
like they’re actually using it. It’s typically a default reference pulled in by 
the project templates.

 

The code though is available from Mono here: 
https://github.com/mono/mono/tree/master/mcs/class/System.Data.DataSetExtensions

 

From: Ted Yu  >
Date: Tuesday, February 9, 2016 at 3:56 PM
To: Bryan Jeffrey  >
Cc: Arko Provo Mukherjee  >, user  >
Subject: Re: Spark with .NET

 

Looks like they have some system support whose source is not in the repo:



 

FYI

 

On Tue, Feb 9, 2016 at 12:17 PM, Bryan Jeffrey  > wrote:

Arko,


Check this out: https://github.com/Microsoft/SparkCLR

 

This is a Microsoft authored C# language binding for Spark.

 

Regards,

 

Bryan Jeffrey

 

On Tue, Feb 9, 2016 at 3:13 PM, Arko Provo Mukherjee 
 > wrote:

Doesn't seem to be supported, but thanks! I will probably write some .NET 
wrapper in my front end and use the java api in the backend. 

Warm regards

Arko

 

 

On Tue, Feb 9, 2016 at 12:05 PM, Ted Yu  > wrote:

This thread is related:

http://search-hadoop.com/m/q3RTtwp4nR1lugin1 
 
=+NET+on+Apache+Spark+

 

On Tue, Feb 9, 2016 at 11:43 AM, Arko Provo Mukherjee 
 > wrote:

Hello, 

 

I want to use Spark (preferable Spark SQL) using C#. Anyone has any pointers to 
that? 

Thanks & regards

Arko

 

 

 

 

 



how to reference aggregate columns

2015-12-09 Thread skaarthik oss
I am trying to process an aggregate column. The name of the column is
"SUM(clicks)" which is automatically assigned when I use SUM operator on a
column named "clicks". I am trying to find the max value in this aggregated
column. However, using max operator on this aggregated columns results in
parser error.

How can I reference an aggregated column when applying operators?

Code to repro this behavior:
val rows = Seq (("X", 1), ("Y", 5), ("Z", 4))
val rdd = sc.parallelize(rows)
val dataFrame = rdd.toDF("user","clicks")
val sumDf = dataFrame.groupBy("user").agg(("clicks", "sum"))
sumDf.registerTempTable("tempTable")
val ttrows = sqlContext.sql("select * from tempTable")
ttrows.show  /* column names are "user" & "SUM(clicks))" */
val ttrows2 = sqlContext.sql("select max(SUM(clicks)) from tempTable") /*
this line results in following error */
val sumCol = ttrows.select("SUM(clicks)") /* this works fine */


15/12/09 17:50:42.938 INFO ParseDriver: Parsing command: select
max(SUM(clicks)) from tempTable
15/12/09 17:50:42.938 INFO ParseDriver: Parse Completed
org.apache.spark.sql.AnalysisException: cannot resolve 'clicks' given input
columns user, SUM(clicks); line 1 pos 15
at
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:63)
at
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:52)
at
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:286)
at
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:286)
at
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)
at
org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:285)
at
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:299)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to
(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at
org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenUp(TreeNode.scala:329)
at
org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:283)
at
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:299)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to
(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at
org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenUp(TreeNode.scala:329)
at
org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:283)
at
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:299)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at