Re: Hi all,

2017-11-04 Thread אורן שמון
Hi Jean,
We prepare the data for all another jobs. We have a lot of jobs that
schedule to different time but all of them need to read same raw data.

On Fri, Nov 3, 2017 at 12:49 PM Jean Georges Perrin 
wrote:

> Hi Oren,
>
> Why don’t you want to use a GroupBy? You can cache or checkpoint the
> result and use it in your process, keeping everything in Spark and avoiding
> save/ingestion...
>
>
> > On Oct 31, 2017, at 08:17, ⁨אורן שמון⁩ <⁨oren.sha...@gmail.com⁩> wrote:
> >
> > I have 2 spark jobs one is pre-process and the second is the process.
> > Process job needs to calculate for each user in the data.
> > I want  to avoid shuffle like groupBy so I think about to save the
> result of the pre-process as bucket by user in Parquet or to re-partition
> by user and save the result .
> >
> > What is prefer ? and why
> > Thanks in advance,
> > Oren
>
>


Re: Hi all,

2017-11-03 Thread Jean Georges Perrin
Hi Oren,

Why don’t you want to use a GroupBy? You can cache or checkpoint the result and 
use it in your process, keeping everything in Spark and avoiding 
save/ingestion...


> On Oct 31, 2017, at 08:17, ⁨אורן שמון⁩ <⁨oren.sha...@gmail.com⁩> wrote:
> 
> I have 2 spark jobs one is pre-process and the second is the process.
> Process job needs to calculate for each user in the data.
> I want  to avoid shuffle like groupBy so I think about to save the result of 
> the pre-process as bucket by user in Parquet or to re-partition by user and 
> save the result .
> 
> What is prefer ? and why 
> Thanks in advance,
> Oren


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Hi

2017-04-07 Thread kant kodali
oops sorry. Please ignore this. wrong mailing list


Re: Hi, guys, does anyone use Spark in finance market?

2016-09-01 Thread Taotao.Li
Hi, Adam, great thanks for your detailed reply, the three videos are
very referential for me. Actually, the App submitted to IBM Spark Contest
is a very small demo, I'll do much more work to enhance that model, and
recently we just started a new project which aims to building a platform
that makes it possible and easy for Chinese financial institutions to
analysis high-frequency market data, which will be ~30G one day.


great thanks again,

On Thu, Sep 1, 2016 at 10:25 PM, Adam Roberts  wrote:

> Hi, yes, there's definitely a market for Apache Spark and financial
> institutions, I can't provide specific details but to answer your survey:
> "yes" and "more than a few GB!"
>
> Here are a couple of examples showing Spark with financial data, full
> disclosure that I work for IBM, I'm sure there are lots more examples you
> can find too:
>
>- https://www.youtube.com/watch?v=VWBNoIwGEjo shows how Spark can be
>used with simple sentiment analysis to figure out correlations between real
>world events and stock market changes. The Spark specific part is from 3 04
>until the 8th minute
>- https://www.youtube.com/watch?v=sDmWcuO5Rk8 is a similar example
>where Spark is again used with sentiment analysis. One could also analyse
>financial data to identify trends, I think a lot of the machine learning
>APIs will be useful here e.g. logistic regression with many features could
>be used to decide whether or not an investment is a good idea based on
>training data (so we'd look at real outcomes from previous speculations)
>
>
> In both cases you can see Spark is a very important component for
> performing the calculations with financial data.
>
> I also know that Goldman Sachs mentioned they are interested in Spark, one
> talk is at https://www.youtube.com/watch?v=HWwAoTK2YrQ, so this is more
> evidence of the financial industries paying attention to big data and Spark.
>
> Regarding your app: I expected it to be similar to the first example where
> the signals you mention are real world events (e.g. the fed lowers interest
> rates or companies are rumoured to either be about to float or be
> acquired).
>
> At the 4 30 part I think you actually identify previous index values and
> extrapolate what they are likely to become using, so in theory your system
> would become more accurate over time although would going off indexes alone
> be sufficient (if indeed this is what you're doing).
>
> I think you'd want to combine this with real world speculation/news to
> figure out *why* the price is likely to change, how much by and in which
> direction.
>
> I agree that Apache Spark can be just the right tool for doing the heavy
> lifting required for analysis, computation and modelling of big data so
> looking forward to future Spark work in this area, and I wonder how we as
> Spark developers can make it easier/more powerful for Spark users to do so
>
>
>
>
> From:"Taotao.Li" 
> To:user 
> Date:30/08/2016 14:14
> Subject:Hi, guys, does anyone use Spark in finance market?
> --
>
>
>
>
> Hi, guys,
>
>  I'm a quant engineer in China, and I believe it's very promising when
> using Spark in the financial market. But I didn't find cases which combine
> spark and finance.
>
> So here I wanna do a small survey:
>
>- do you guys use Spark in financial market related project?
>- if yes, how large data was fed in your spark application?
>
>
>  thanks a lot.
>
> *___**​*
> ​A little ad, I attended IBM Spark Hackathon, which is here :
> *http://apachespark.devpost.com/*  , and
> I submitted a small application, which will be used in my strategies, hope
> you guys and give me a vote and some suggestions on how to use spark in
> financial market, to discover some trade opportunity.
>
> here is my small app:
> *http://devpost.com/software/spark-in-finance-quantitative-investing*
> 
>
> thanks a lot.​
>
>
> --
> *___*
> Quant | Engineer | Boy
> *___*
> *blog*:*http://litaotao.github.io*
> 
> *github*: *www.github.com/litaotao* 
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>



-- 
*___*
Quant | Engineer | Boy
*___*
*blog*:http://litaotao.github.io

*github*: www.github.com/litaotao


Re: Hi, guys, does anyone use Spark in finance market?

2016-09-01 Thread Adam Roberts
Hi, yes, there's definitely a market for Apache Spark and financial 
institutions, I can't provide specific details but to answer your survey: 
"yes" and "more than a few GB!"

Here are a couple of examples showing Spark with financial data, full 
disclosure that I work for IBM, I'm sure there are lots more examples you 
can find too:

https://www.youtube.com/watch?v=VWBNoIwGEjo shows how Spark can be used 
with simple sentiment analysis to figure out correlations between real 
world events and stock market changes. The Spark specific part is from 3 
04 until the 8th minute
https://www.youtube.com/watch?v=sDmWcuO5Rk8 is a similar example where 
Spark is again used with sentiment analysis. One could also analyse 
financial data to identify trends, I think a lot of the machine learning 
APIs will be useful here e.g. logistic regression with many features could 
be used to decide whether or not an investment is a good idea based on 
training data (so we'd look at real outcomes from previous speculations)

In both cases you can see Spark is a very important component for 
performing the calculations with financial data.

I also know that Goldman Sachs mentioned they are interested in Spark, one 
talk is at https://www.youtube.com/watch?v=HWwAoTK2YrQ, so this is more 
evidence of the financial industries paying attention to big data and 
Spark.

Regarding your app: I expected it to be similar to the first example where 
the signals you mention are real world events (e.g. the fed lowers 
interest rates or companies are rumoured to either be about to float or be 
acquired). 

At the 4 30 part I think you actually identify previous index values and 
extrapolate what they are likely to become using, so in theory your system 
would become more accurate over time although would going off indexes 
alone be sufficient (if indeed this is what you're doing). 

I think you'd want to combine this with real world speculation/news to 
figure out *why* the price is likely to change, how much by and in which 
direction.

I agree that Apache Spark can be just the right tool for doing the heavy 
lifting required for analysis, computation and modelling of big data so 
looking forward to future Spark work in this area, and I wonder how we as 
Spark developers can make it easier/more powerful for Spark users to do so




From:   "Taotao.Li" 
To: user 
Date:   30/08/2016 14:14
Subject:Hi, guys, does anyone use Spark in finance market?




Hi, guys,

 I'm a quant engineer in China, and I believe it's very promising when 
using Spark in the financial market. But I didn't find cases which combine 
spark and finance.

So here I wanna do a small survey: 

do you guys use Spark in financial market related project?
if yes, how large data was fed in your spark application?

 thanks a lot.

___​
​A little ad, I attended IBM Spark Hackathon, which is here : 
http://apachespark.devpost.com/ , and I submitted a small application, 
which will be used in my strategies, hope you guys and give me a vote and 
some suggestions on how to use spark in financial market, to discover some 
trade opportunity.

here is my small app: 
http://devpost.com/software/spark-in-finance-quantitative-investing

thanks a lot.​


-- 
___
Quant | Engineer | Boy
___
blog:http://litaotao.github.io
github: www.github.com/litaotao

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU



Re: Hi: hadoop 2.5 for spark

2015-01-30 Thread fightf...@163.com
Hi, Siddharth
You can re build spark with maven by specifying -Dhadoop.version=2.5.0

Thanks,
Sun.



fightf...@163.com
 
From: Siddharth Ubale
Date: 2015-01-30 15:50
To: user@spark.apache.org
Subject: Hi: hadoop 2.5 for spark
Hi ,
 
I am beginner with Apache spark.
 
Can anyone let me know if it is mandatory to build spark with the Hadoop 
version I am using or can I use a pre built package and use it with my existing 
HDFS root folder?
I am using Hadoop 2.5.0 and want to use Apache spark 1.2.0 with it.
I could see a pre built version for 2.4 and above in the downbloads section of 
Spark homepage - downloads.
 
Siddharth Ubale,
Synchronized Communications 
#43, Velankani Tech Park, Block No. II, 
3rd Floor, Electronic City Phase I,
Bangalore – 560 100
Tel : +91 80 3202 4060
Web: www.syncoms.com
London|Bangalore|Orlando
 
we innovate, plan, execute, and transform the business​
 
邮件带有附件预览链接,若您转发或回复此邮件时不希望对方预览附件,建议您手动删除链接。
共有 1 个附件
image001.jpg(3K) 极速下载 在线预览 


Re: Hi: hadoop 2.5 for spark

2015-01-30 Thread bit1...@163.com
You can use prebuilt version that is built upon hadoop2.4.




From: Siddharth Ubale
Date: 2015-01-30 15:50
To: user@spark.apache.org
Subject: Hi: hadoop 2.5 for spark
Hi ,
 
I am beginner with Apache spark.
 
Can anyone let me know if it is mandatory to build spark with the Hadoop 
version I am using or can I use a pre built package and use it with my existing 
HDFS root folder?
I am using Hadoop 2.5.0 and want to use Apache spark 1.2.0 with it.
I could see a pre built version for 2.4 and above in the downbloads section of 
Spark homepage - downloads.
 
Siddharth Ubale,
Synchronized Communications 
#43, Velankani Tech Park, Block No. II, 
3rd Floor, Electronic City Phase I,
Bangalore – 560 100
Tel : +91 80 3202 4060
Web: www.syncoms.com
London|Bangalore|Orlando
 
we innovate, plan, execute, and transform the business​
 
邮件带有附件预览链接,若您转发或回复此邮件时不希望对方预览附件,建议您手动删除链接。
共有 1 个附件
image001.jpg(3K) 极速下载 在线预览 


RE: Hi

2014-08-20 Thread Shao, Saisai
Hi,

Actually several java task threads running in a single executor, not processes, 
so each executor will only have one JVM runtime which shares with different 
task threads.

Thanks
Jerry

From: rapelly kartheek [mailto:kartheek.m...@gmail.com]
Sent: Wednesday, August 20, 2014 5:29 PM
To: user@spark.apache.org
Subject: Hi

Hi
I have this doubt:

I understand that each java process runs on different JVM instances. Now, if I 
have a single executor on my machine and run several java processes, then there 
will be several JVM instances running.

Now, process_local means, the data is located on the same JVM as the task that 
is launched. But, the memory associated with the entire executor is same. Then, 
how does this memory gets distributed across the JVMs??. I mean, how this 
memory gets  associated with multiple JVMs??
Thank you!!!
-karthik


Re: hi

2014-06-23 Thread Andrew Or
Ah never mind. The 0.0.0.0 is for the UI, not for Master, which uses the
output of the hostname command. But yes, long answer short, go to the web
UI and use that URL.


2014-06-23 11:13 GMT-07:00 Andrew Or and...@databricks.com:

 Hm, spark://localhost:7077 should work, because the standalone master
 binds to 0.0.0.0. Are you sure you ran `sbin/start-master.sh`?


 2014-06-22 22:50 GMT-07:00 Akhil Das ak...@sigmoidanalytics.com:

 Open your webUI in the browser and see the spark url in the top left
 corner of the page and use it while starting your spark shell instead of
 localhost:7077.

 Thanks
 Best Regards


 On Mon, Jun 23, 2014 at 10:56 AM, rapelly kartheek 
 kartheek.m...@gmail.com wrote:

 Hi
   Can someone help me with the following error that I faced while
 setting up single node spark framework.

 karthik@karthik-OptiPlex-9020:~/spark-1.0.0$
 MASTER=spark://localhost:7077 sbin/spark-shell
 bash: sbin/spark-shell: No such file or directory
 karthik@karthik-OptiPlex-9020:~/spark-1.0.0$
 MASTER=spark://localhost:7077 bin/spark-shell
 Java HotSpot(TM) 64-Bit Server VM warning: ignoring option
 MaxPermSize=128m; support was removed in 8.0
 14/06/23 10:44:53 INFO spark.SecurityManager: Changing view acls to:
 karthik
 14/06/23 10:44:53 INFO spark.SecurityManager: SecurityManager:
 authentication disabled; ui acls disabled; users with view permissions:
 Set(karthik)
 14/06/23 10:44:53 INFO spark.HttpServer: Starting HTTP Server
 14/06/23 10:44:53 INFO server.Server: jetty-8.y.z-SNAPSHOT
 14/06/23 10:44:53 INFO server.AbstractConnector: Started
 SocketConnector@0.0.0.0:39588
 Welcome to
     __
  / __/__  ___ _/ /__
 _\ \/ _ \/ _ `/ __/  '_/
/___/ .__/\_,_/_/ /_/\_\   version 1.0.0
   /_/

 Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
 1.8.0_05)
 Type in expressions to have them evaluated.
 Type :help for more information.
 14/06/23 10:44:55 INFO spark.SecurityManager: Changing view acls to:
 karthik
 14/06/23 10:44:55 INFO spark.SecurityManager: SecurityManager:
 authentication disabled; ui acls disabled; users with view permissions:
 Set(karthik)
 14/06/23 10:44:55 INFO slf4j.Slf4jLogger: Slf4jLogger started
 14/06/23 10:44:55 INFO Remoting: Starting remoting
 14/06/23 10:44:55 INFO Remoting: Remoting started; listening on
 addresses :[akka.tcp://spark@karthik-OptiPlex-9020:50294]
 14/06/23 10:44:55 INFO Remoting: Remoting now listens on addresses:
 [akka.tcp://spark@karthik-OptiPlex-9020:50294]
 14/06/23 10:44:55 INFO spark.SparkEnv: Registering MapOutputTracker
 14/06/23 10:44:55 INFO spark.SparkEnv: Registering BlockManagerMaster
 14/06/23 10:44:55 INFO storage.DiskBlockManager: Created local directory
 at /tmp/spark-local-20140623104455-3297
 14/06/23 10:44:55 INFO storage.MemoryStore: MemoryStore started with
 capacity 294.6 MB.
 14/06/23 10:44:55 INFO network.ConnectionManager: Bound socket to port
 60264 with id = ConnectionManagerId(karthik-OptiPlex-9020,60264)
 14/06/23 10:44:55 INFO storage.BlockManagerMaster: Trying to register
 BlockManager
 14/06/23 10:44:55 INFO storage.BlockManagerInfo: Registering block
 manager karthik-OptiPlex-9020:60264 with 294.6 MB RAM
 14/06/23 10:44:55 INFO storage.BlockManagerMaster: Registered
 BlockManager
 14/06/23 10:44:55 INFO spark.HttpServer: Starting HTTP Server
 14/06/23 10:44:55 INFO server.Server: jetty-8.y.z-SNAPSHOT
 14/06/23 10:44:55 INFO server.AbstractConnector: Started
 SocketConnector@0.0.0.0:38307
 14/06/23 10:44:55 INFO broadcast.HttpBroadcast: Broadcast server started
 at http://10.0.1.61:38307
 14/06/23 10:44:55 INFO spark.HttpFileServer: HTTP File server directory
 is /tmp/spark-082a44f6-e877-48cc-8ab7-1bcbcf8136b0
 14/06/23 10:44:55 INFO spark.HttpServer: Starting HTTP Server
 14/06/23 10:44:55 INFO server.Server: jetty-8.y.z-SNAPSHOT
 14/06/23 10:44:55 INFO server.AbstractConnector: Started
 SocketConnector@0.0.0.0:58745
 14/06/23 10:44:56 INFO server.Server: jetty-8.y.z-SNAPSHOT
 14/06/23 10:44:56 INFO server.AbstractConnector: Started
 SelectChannelConnector@0.0.0.0:4040
 14/06/23 10:44:56 INFO ui.SparkUI: Started SparkUI at
 http://karthik-OptiPlex-9020:4040
 14/06/23 10:44:56 WARN util.NativeCodeLoader: Unable to load
 native-hadoop library for your platform... using builtin-java classes where
 applicable
 14/06/23 10:44:56 INFO client.AppClient$ClientActor: Connecting to
 master spark://localhost:7077...
 14/06/23 10:44:56 INFO repl.SparkILoop: Created spark context..
 14/06/23 10:44:56 WARN client.AppClient$ClientActor: Could not connect
 to akka.tcp://sparkMaster@localhost:7077:
 akka.remote.EndpointAssociationException: Association failed with
 [akka.tcp://sparkMaster@localhost:7077]
 Spark context available as sc.

 scala 14/06/23 10:44:56 WARN client.AppClient$ClientActor: Could not
 connect to akka.tcp://sparkMaster@localhost:7077:
 akka.remote.EndpointAssociationException: Association failed with
 [akka.tcp://sparkMaster@localhost:7077]
 14/06/23 10:44:56 WARN 

Re: hi

2014-06-22 Thread Akhil Das
Open your webUI in the browser and see the spark url in the top left corner
of the page and use it while starting your spark shell instead of
localhost:7077.

Thanks
Best Regards


On Mon, Jun 23, 2014 at 10:56 AM, rapelly kartheek kartheek.m...@gmail.com
wrote:

 Hi
   Can someone help me with the following error that I faced while setting
 up single node spark framework.

 karthik@karthik-OptiPlex-9020:~/spark-1.0.0$
 MASTER=spark://localhost:7077 sbin/spark-shell
 bash: sbin/spark-shell: No such file or directory
 karthik@karthik-OptiPlex-9020:~/spark-1.0.0$
 MASTER=spark://localhost:7077 bin/spark-shell
 Java HotSpot(TM) 64-Bit Server VM warning: ignoring option
 MaxPermSize=128m; support was removed in 8.0
 14/06/23 10:44:53 INFO spark.SecurityManager: Changing view acls to:
 karthik
 14/06/23 10:44:53 INFO spark.SecurityManager: SecurityManager:
 authentication disabled; ui acls disabled; users with view permissions:
 Set(karthik)
 14/06/23 10:44:53 INFO spark.HttpServer: Starting HTTP Server
 14/06/23 10:44:53 INFO server.Server: jetty-8.y.z-SNAPSHOT
 14/06/23 10:44:53 INFO server.AbstractConnector: Started
 SocketConnector@0.0.0.0:39588
 Welcome to
     __
  / __/__  ___ _/ /__
 _\ \/ _ \/ _ `/ __/  '_/
/___/ .__/\_,_/_/ /_/\_\   version 1.0.0
   /_/

 Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java
 1.8.0_05)
 Type in expressions to have them evaluated.
 Type :help for more information.
 14/06/23 10:44:55 INFO spark.SecurityManager: Changing view acls to:
 karthik
 14/06/23 10:44:55 INFO spark.SecurityManager: SecurityManager:
 authentication disabled; ui acls disabled; users with view permissions:
 Set(karthik)
 14/06/23 10:44:55 INFO slf4j.Slf4jLogger: Slf4jLogger started
 14/06/23 10:44:55 INFO Remoting: Starting remoting
 14/06/23 10:44:55 INFO Remoting: Remoting started; listening on addresses
 :[akka.tcp://spark@karthik-OptiPlex-9020:50294]
 14/06/23 10:44:55 INFO Remoting: Remoting now listens on addresses:
 [akka.tcp://spark@karthik-OptiPlex-9020:50294]
 14/06/23 10:44:55 INFO spark.SparkEnv: Registering MapOutputTracker
 14/06/23 10:44:55 INFO spark.SparkEnv: Registering BlockManagerMaster
 14/06/23 10:44:55 INFO storage.DiskBlockManager: Created local directory
 at /tmp/spark-local-20140623104455-3297
 14/06/23 10:44:55 INFO storage.MemoryStore: MemoryStore started with
 capacity 294.6 MB.
 14/06/23 10:44:55 INFO network.ConnectionManager: Bound socket to port
 60264 with id = ConnectionManagerId(karthik-OptiPlex-9020,60264)
 14/06/23 10:44:55 INFO storage.BlockManagerMaster: Trying to register
 BlockManager
 14/06/23 10:44:55 INFO storage.BlockManagerInfo: Registering block manager
 karthik-OptiPlex-9020:60264 with 294.6 MB RAM
 14/06/23 10:44:55 INFO storage.BlockManagerMaster: Registered BlockManager
 14/06/23 10:44:55 INFO spark.HttpServer: Starting HTTP Server
 14/06/23 10:44:55 INFO server.Server: jetty-8.y.z-SNAPSHOT
 14/06/23 10:44:55 INFO server.AbstractConnector: Started
 SocketConnector@0.0.0.0:38307
 14/06/23 10:44:55 INFO broadcast.HttpBroadcast: Broadcast server started
 at http://10.0.1.61:38307
 14/06/23 10:44:55 INFO spark.HttpFileServer: HTTP File server directory is
 /tmp/spark-082a44f6-e877-48cc-8ab7-1bcbcf8136b0
 14/06/23 10:44:55 INFO spark.HttpServer: Starting HTTP Server
 14/06/23 10:44:55 INFO server.Server: jetty-8.y.z-SNAPSHOT
 14/06/23 10:44:55 INFO server.AbstractConnector: Started
 SocketConnector@0.0.0.0:58745
 14/06/23 10:44:56 INFO server.Server: jetty-8.y.z-SNAPSHOT
 14/06/23 10:44:56 INFO server.AbstractConnector: Started
 SelectChannelConnector@0.0.0.0:4040
 14/06/23 10:44:56 INFO ui.SparkUI: Started SparkUI at
 http://karthik-OptiPlex-9020:4040
 14/06/23 10:44:56 WARN util.NativeCodeLoader: Unable to load native-hadoop
 library for your platform... using builtin-java classes where applicable
 14/06/23 10:44:56 INFO client.AppClient$ClientActor: Connecting to master
 spark://localhost:7077...
 14/06/23 10:44:56 INFO repl.SparkILoop: Created spark context..
 14/06/23 10:44:56 WARN client.AppClient$ClientActor: Could not connect to
 akka.tcp://sparkMaster@localhost:7077:
 akka.remote.EndpointAssociationException: Association failed with
 [akka.tcp://sparkMaster@localhost:7077]
 Spark context available as sc.

 scala 14/06/23 10:44:56 WARN client.AppClient$ClientActor: Could not
 connect to akka.tcp://sparkMaster@localhost:7077:
 akka.remote.EndpointAssociationException: Association failed with
 [akka.tcp://sparkMaster@localhost:7077]
 14/06/23 10:44:56 WARN client.AppClient$ClientActor: Could not connect to
 akka.tcp://sparkMaster@localhost:7077:
 akka.remote.EndpointAssociationException: Association failed with
 [akka.tcp://sparkMaster@localhost:7077]
 14/06/23 10:44:56 WARN client.AppClient$ClientActor: Could not connect to
 akka.tcp://sparkMaster@localhost:7077:
 akka.remote.EndpointAssociationException: Association failed with
 [akka.tcp://sparkMaster@localhost:7077]
 14/06/23 10:45:16 INFO