Re: Understanding how spark share db connections created on driver

2017-06-29 Thread Gourav Sengupta
Hi Sotiris,

can you upload any sample data and then I will write the code and send it
across to you?

Do you have any particular used case for Influx DB?


Regards,
Gourav Sengupta

On Thu, Jun 29, 2017 at 5:42 PM, Sotiris Beis <sot.b...@gmail.com> wrote:

> Hi Gourav,
>
> Do you have any suggestions of how the use of dataframes will solve my
> problem?
>
> Cheers,
> Sotiris
>
> On Thu, 29 Jun 2017 at 17:37 Gourav Sengupta <gourav.sengu...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I still do not understand why people do not use data frames.
>>
>> It makes you smile, take a sip of fine coffee, and feel good about life
>> and its all courtesy@SPARK. :)
>>
>> Regards,
>> Gourav Sengupta
>>
>> On Thu, Jun 29, 2017 at 12:18 PM, Ryan <ryan.hd@gmail.com> wrote:
>>
>>> I think it creates a new connection on each worker, whenever the
>>> Processor references Resource, it got initialized.
>>> There's no need for the driver connect to the db in this case.
>>>
>>> On Thu, Jun 29, 2017 at 5:52 PM, salvador <sot.b...@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am writing a spark job from which at some point I want to send some
>>>> metrics to InfluxDB. Here is some sample code of how I am doing it at
>>>> the
>>>> moment.
>>>>
>>>> I have a Resources object class which contains all the details for the
>>>> db
>>>> connection:
>>>>
>>>> object Resources { def forceInit: () => Unit = () => ()
>>>>   val influxHost: String = Config.influxHost.getOrElse("localhost")
>>>>   val influxUdpPort: Int = Config.influxUdpPort.getOrElse(30089)
>>>>
>>>>   val influxDB = new MetricsClient(influxHost, influxUdpPort, "spark")
>>>>
>>>> }
>>>>
>>>> This is how my code on the driver looks like:
>>>>
>>>> object ProcessStuff extends App {
>>>>   val spark = SparkSession .builder() .config(sparkConfig)
>>>> .getOrCreate()
>>>>   val df = spark .read .parquet(Config.input)
>>>>
>>>>   Resources.forceInit
>>>>
>>>>   val annotatedSentences = df.rdd
>>>> .map {
>>>>   case (Row(a: String, b: String)) => Processor.process(a,b)
>>>> }
>>>> .cache()
>>>> }
>>>>
>>>> I am sending all the metrics I want from the process() method which
>>>> uses the
>>>> client I initialised on the driver code. Currently this works and I am
>>>> able
>>>> to send millions of data point. I was just wandering how it works
>>>> internally. Does it share the db connection or creates a new connection
>>>> every time?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context: http://apache-spark-user-list.
>>>> 1001560.n3.nabble.com/Understanding-how-spark-share-
>>>> db-connections-created-on-driver-tp28806.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> -
>>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>>
>>>>
>>>
>>


Re: Understanding how spark share db connections created on driver

2017-06-29 Thread Sotiris Beis
Hi Gourav,

Do you have any suggestions of how the use of dataframes will solve my
problem?

Cheers,
Sotiris

On Thu, 29 Jun 2017 at 17:37 Gourav Sengupta <gourav.sengu...@gmail.com>
wrote:

> Hi,
>
> I still do not understand why people do not use data frames.
>
> It makes you smile, take a sip of fine coffee, and feel good about life
> and its all courtesy@SPARK. :)
>
> Regards,
> Gourav Sengupta
>
> On Thu, Jun 29, 2017 at 12:18 PM, Ryan <ryan.hd@gmail.com> wrote:
>
>> I think it creates a new connection on each worker, whenever the
>> Processor references Resource, it got initialized.
>> There's no need for the driver connect to the db in this case.
>>
>> On Thu, Jun 29, 2017 at 5:52 PM, salvador <sot.b...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I am writing a spark job from which at some point I want to send some
>>> metrics to InfluxDB. Here is some sample code of how I am doing it at the
>>> moment.
>>>
>>> I have a Resources object class which contains all the details for the db
>>> connection:
>>>
>>> object Resources { def forceInit: () => Unit = () => ()
>>>   val influxHost: String = Config.influxHost.getOrElse("localhost")
>>>   val influxUdpPort: Int = Config.influxUdpPort.getOrElse(30089)
>>>
>>>   val influxDB = new MetricsClient(influxHost, influxUdpPort, "spark")
>>>
>>> }
>>>
>>> This is how my code on the driver looks like:
>>>
>>> object ProcessStuff extends App {
>>>   val spark = SparkSession .builder() .config(sparkConfig) .getOrCreate()
>>>   val df = spark .read .parquet(Config.input)
>>>
>>>   Resources.forceInit
>>>
>>>   val annotatedSentences = df.rdd
>>> .map {
>>>   case (Row(a: String, b: String)) => Processor.process(a,b)
>>> }
>>> .cache()
>>> }
>>>
>>> I am sending all the metrics I want from the process() method which uses
>>> the
>>> client I initialised on the driver code. Currently this works and I am
>>> able
>>> to send millions of data point. I was just wandering how it works
>>> internally. Does it share the db connection or creates a new connection
>>> every time?
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Understanding-how-spark-share-db-connections-created-on-driver-tp28806.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> -
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>>>
>>
>


Re: Understanding how spark share db connections created on driver

2017-06-29 Thread Gourav Sengupta
Hi,

I still do not understand why people do not use data frames.

It makes you smile, take a sip of fine coffee, and feel good about life and
its all courtesy@SPARK. :)

Regards,
Gourav Sengupta

On Thu, Jun 29, 2017 at 12:18 PM, Ryan <ryan.hd@gmail.com> wrote:

> I think it creates a new connection on each worker, whenever the Processor
> references Resource, it got initialized.
> There's no need for the driver connect to the db in this case.
>
> On Thu, Jun 29, 2017 at 5:52 PM, salvador <sot.b...@gmail.com> wrote:
>
>> Hi all,
>>
>> I am writing a spark job from which at some point I want to send some
>> metrics to InfluxDB. Here is some sample code of how I am doing it at the
>> moment.
>>
>> I have a Resources object class which contains all the details for the db
>> connection:
>>
>> object Resources { def forceInit: () => Unit = () => ()
>>   val influxHost: String = Config.influxHost.getOrElse("localhost")
>>   val influxUdpPort: Int = Config.influxUdpPort.getOrElse(30089)
>>
>>   val influxDB = new MetricsClient(influxHost, influxUdpPort, "spark")
>>
>> }
>>
>> This is how my code on the driver looks like:
>>
>> object ProcessStuff extends App {
>>   val spark = SparkSession .builder() .config(sparkConfig) .getOrCreate()
>>   val df = spark .read .parquet(Config.input)
>>
>>   Resources.forceInit
>>
>>   val annotatedSentences = df.rdd
>> .map {
>>   case (Row(a: String, b: String)) => Processor.process(a,b)
>> }
>> .cache()
>> }
>>
>> I am sending all the metrics I want from the process() method which uses
>> the
>> client I initialised on the driver code. Currently this works and I am
>> able
>> to send millions of data point. I was just wandering how it works
>> internally. Does it share the db connection or creates a new connection
>> every time?
>>
>>
>>
>>
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-user-list.
>> 1001560.n3.nabble.com/Understanding-how-spark-share-db-
>> connections-created-on-driver-tp28806.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>


Re: Understanding how spark share db connections created on driver

2017-06-29 Thread Ryan
I think it creates a new connection on each worker, whenever the Processor
references Resource, it got initialized.
There's no need for the driver connect to the db in this case.

On Thu, Jun 29, 2017 at 5:52 PM, salvador <sot.b...@gmail.com> wrote:

> Hi all,
>
> I am writing a spark job from which at some point I want to send some
> metrics to InfluxDB. Here is some sample code of how I am doing it at the
> moment.
>
> I have a Resources object class which contains all the details for the db
> connection:
>
> object Resources { def forceInit: () => Unit = () => ()
>   val influxHost: String = Config.influxHost.getOrElse("localhost")
>   val influxUdpPort: Int = Config.influxUdpPort.getOrElse(30089)
>
>   val influxDB = new MetricsClient(influxHost, influxUdpPort, "spark")
>
> }
>
> This is how my code on the driver looks like:
>
> object ProcessStuff extends App {
>   val spark = SparkSession .builder() .config(sparkConfig) .getOrCreate()
>   val df = spark .read .parquet(Config.input)
>
>   Resources.forceInit
>
>   val annotatedSentences = df.rdd
> .map {
>   case (Row(a: String, b: String)) => Processor.process(a,b)
> }
> .cache()
> }
>
> I am sending all the metrics I want from the process() method which uses
> the
> client I initialised on the driver code. Currently this works and I am able
> to send millions of data point. I was just wandering how it works
> internally. Does it share the db connection or creates a new connection
> every time?
>
>
>
>
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Understanding-how-spark-share-
> db-connections-created-on-driver-tp28806.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Understanding how spark share db connections created on driver

2017-06-29 Thread salvador
Hi all,

I am writing a spark job from which at some point I want to send some
metrics to InfluxDB. Here is some sample code of how I am doing it at the
moment.

I have a Resources object class which contains all the details for the db
connection:

object Resources { def forceInit: () => Unit = () => ()
  val influxHost: String = Config.influxHost.getOrElse("localhost") 
  val influxUdpPort: Int = Config.influxUdpPort.getOrElse(30089)

  val influxDB = new MetricsClient(influxHost, influxUdpPort, "spark")

}

This is how my code on the driver looks like:

object ProcessStuff extends App { 
  val spark = SparkSession .builder() .config(sparkConfig) .getOrCreate()
  val df = spark .read .parquet(Config.input)

  Resources.forceInit

  val annotatedSentences = df.rdd
.map { 
  case (Row(a: String, b: String)) => Processor.process(a,b) 
} 
.cache() 
}

I am sending all the metrics I want from the process() method which uses the
client I initialised on the driver code. Currently this works and I am able
to send millions of data point. I was just wandering how it works
internally. Does it share the db connection or creates a new connection
every time?







--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Understanding-how-spark-share-db-connections-created-on-driver-tp28806.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org