Re: Spark SQL and number of task

2016-08-04 Thread Marco Colombo
gt; underline price paid. > > > Yong > > > -- > *From:* Takeshi Yamamuro <linguin@gmail.com> > *Sent:* Thursday, August 4, 2016 8:18 AM > *To:* Marco Colombo > *Cc:* user > *Subject:* Re: Spark SQL and number of task > &

Re: Spark SQL and number of task

2016-08-04 Thread Marco Colombo
. > > // maropu > > > On Thu, Aug 4, 2016 at 4:58 PM, Marco Colombo <ing.marco.colo...@gmail.com > > wrote: > >> Hi all, I've a question on how hive+spark are handling data. >> >> I've started a new HiveContext and I'm extracting data from cassandra. >>

Spark SQL and number of task

2016-08-04 Thread Marco Colombo
Hi all, I've a question on how hive+spark are handling data. I've started a new HiveContext and I'm extracting data from cassandra. I've configured spark.sql.shuffle.partitions=10. Now, I've following query: select d.id, avg(d.avg) from v_points d where id=90 group by id; I see that 10 task are

Re: Possible to push sub-queries down into the DataSource impl?

2016-07-27 Thread Marco Colombo
g if Spark has > the hooks to allow me to try ;-) > > Cheers, > Tim > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org <javascript:;> > > -- Ing. Marco Colombo

Re: jdbcRRD and dataframe

2016-07-25 Thread Marco Colombo
laimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from

Re: jdbcRRD and dataframe

2016-07-25 Thread Marco Colombo
> > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > T

jdbcRRD and dataframe

2016-07-25 Thread Marco Colombo
Hi all, I was using JdbcRRD and signature for constructure was accepting a function to get a DB connection. This is very useful to provide my own connection handler. I'm valuating to move to daraframe, but I cannot how to provide such function and migrate my code. I want to use my own

Re: Hive and distributed sql engine

2016-07-25 Thread Marco Colombo
ppens from each executors, so you must have > a connection or a pool of connection per worker. Executors of the same > worker can share connection pool. > > Best > Ayan > On 25 Jul 2016 16:48, "Marco Colombo" <ing.marco.colo...@gmail.com > <javascript:_e(%7B%7D,'cvml',

Hive and distributed sql engine

2016-07-25 Thread Marco Colombo
to rewrite them as udaf? Thanks! -- Ing. Marco Colombo

Re: Fast database with writes per second and horizontal scaling

2016-07-22 Thread Marco Colombo
o.com >>> <javascript:_e(%7B%7D,'cvml','ashok34...@yahoo.com');>> wrote: >>> >>> >>> Hi Gurus, >>> >>> Advice appreciated from Hive gurus. >>> >>> My colleague has been using Cassandra. However, he says it is too slow >>> and not user friendly/ >>> MongodDB as a doc databases is pretty neat but not fast enough >>> >>> May main concern is fast writes per second and good scaling. >>> >>> >>> Hive on Spark or Tez? >>> >>> How about Hbase. or anything else >>> >>> Any expert advice warmly acknowledged.. >>> >>> thanking you >>> >>> >>> >> >> >> -- >> Best Regards, >> Ayan Guha >> > > -- Ing. Marco Colombo

Re: spark and plot data

2016-07-22 Thread Marco Colombo
; > my question we don't have tools for ploting data each time we have to > switch and go back to python for using plot. > but when you have large result scatter plot or roc curve you cant use > collect to take data . > > somone have propostion for plot . > > thanks > > -- Ing. Marco Colombo

Re: HiveThriftServer2.startWithContext no more showing tables in 1.6.2

2016-07-21 Thread Marco Colombo
nd all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destructi

HiveThriftServer2.startWithContext no more showing tables in 1.6.2

2016-07-21 Thread Marco Colombo
Hi all, I have a spark application that was working in 1.5.2, but now has a problem in 1.6.2. Here is an example: val conf = new SparkConf() .setMaster("spark://10.0.2.15:7077") .setMaster("local") .set("spark.cassandra.connection.host", "10.0.2.15")

Error starting thrift server on Spark

2016-07-11 Thread Marco Colombo
Hi all, I cannot start thrift server on spark 1.6.2 I've configured binding port and IP and left default metastore. In logs I get: 16/07/11 22:51:46 INFO NettyBlockTransferService: Server created on 46717 16/07/11 22:51:46 INFO BlockManagerMaster: Trying to register BlockManager 16/07/11 22:51:46

Re: Exposing dataframe via thrift server

2016-03-31 Thread Marco Colombo
+--+--+ > 5 rows selected (0.126 seconds) > > > > It shows table that are persisted on hive metastore using saveAsTable. > Temp table (registerTempTable) can't able to view > > Can any1 help me with this, > Thanks > -- Ing. Marco Colombo

Re: Spark and DB connection pool

2016-03-25 Thread Marco Colombo
t; Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Ing. Marco Colombo

Re: Reading Back a Cached RDD

2016-03-24 Thread Marco Colombo
ist(), is then possible to come back later and >>> access the persisted RDD. >>> >>> Let's say for instance coming back and starting a new Spark shell >>> session. How would one access the persisted RDD in the new shell session ? >>> >>> >>> Thanks, >>> >>> -- >>> >>>Nick >>> >> > > > -- > Cell : 425-233-8271 > Twitter: https://twitter.com/holdenkarau > -- Ing. Marco Colombo