Re: [Dev] [Architecture] Carbon Spark JDBC connector

Inosh Goonewardena Fri, 14 Aug 2015 21:18:07 -0700

Hi,

>
>    1. Adding new spark dialects related for various dbs (WIP)
>
> I have added new spark JDBC dialects for following DBs.


   - mysql
   - mssql
   - oracle
   - postgres
   - db2


No. Not incremental data processing. My question regarding the deleting
> entire summery table records and re-insert again. IMO, doing upsert will be
> more efficient than your above approach. Again, if there is no other
> option, above re-insert is done as a batch operation or are you insert
> record one by one?
>

Yes. We had this implementation in Hive JDBC Handler. It is possible to
follow the similar  approach here, i.e., if upsert queries are supported by
the DB(most DBs support merge queries) we can let the user to specify the
upsert query to be used in the spark script and if the upsert query is not
provided we can check the records exist in the table by using primary key
and update or insert the records accordingly. But, IMO, since we support
update operations in our data layer(when the CarbonAnalytics is used), we
shouldn't worry that much about supporting it in Carbon Spark JDBC
connector. On the other hand if that is how Spark supports insert
into/overwrite by default, we should follow the same approach.

On Thu, Aug 13, 2015 at 8:18 AM, Gihan Anuruddha <gi...@wso2.com> wrote:

> Hi Niranda,
>
> No. Not incremental data processing. My question regarding the deleting
> entire summery table records and re-insert again. IMO, doing upsert will be
> more efficient than your above approach. Again, if there is no other
> option, above re-insert is done as a batch operation or are you insert
> record one by one?
>
> Regards,
> Gihan
>
> On Wed, Aug 12, 2015 at 11:40 AM, Niranda Perera <nira...@wso2.com> wrote:
>
>> Hi Gihan,
>>
>> are we talking about incremental processing here? insert into/overwrite
>> queries will normally be used to push analyzed data into summary tables.
>>
>> in the spark jargon, insert overwrite table means, completely deleting
>> the table and recreating it. I'm a confused with the meaning of 'overwrite'
>> with respect to the previous 2.5.0 Hive scripts, are doing an update there?
>>
>> rgds
>>
>> On Tue, Aug 11, 2015 at 7:58 PM, Gihan Anuruddha <gi...@wso2.com> wrote:
>>
>>> Hi Niranda,
>>>
>>> Are we going to solve those limitations before the GA? Specially
>>> limitation no.2. Over time we can have stat table with thousands of
>>> records, so are we going to remove all the records and reinsert every time
>>> that spark script runs?
>>>
>>> Regards,
>>> Gihan
>>>
>>> On Tue, Aug 11, 2015 at 7:13 AM, Niranda Perera <nira...@wso2.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> we have implemented a custom Spark JDBC connector to be used in the
>>>> Carbon environment.
>>>>
>>>> this enables the following
>>>>
>>>>    1. Now, temporary tables can be created in the Spark environment by
>>>>    specifying an analytics datasource (configured by the
>>>>    analytics-datasources.xml) and a table
>>>>    2. Spark uses "SELECT 1 FROM $table LIMIT 1" query to check the
>>>>    existence of a table and the LIMIT query is not provided by all dbs. 
>>>> With
>>>>    the new connector, this query can be provided with as a config. (this
>>>>    config is still WIP)
>>>>    3. Adding new spark dialects related for various dbs (WIP)
>>>>
>>>> the idea is to test this for the following dbs
>>>>
>>>>    - mysql
>>>>    - h2
>>>>    - mssql
>>>>    - oracle
>>>>    - postgres
>>>>    - db2
>>>>
>>>> I have loosely tested the connector with MySQL, and I would like the
>>>> APIM team to use it with the API usage stats use-case, and provide us some
>>>> feedback.
>>>>
>>>> this connector can be accessed as follows. (docs are still not updated.
>>>> I will do that ASAP)
>>>>
>>>> create temporary table <temp_table> using CarbonJDBC options
>>>> (dataSource "<datasource name>", tableName "<table name>");
>>>>
>>>> select * from <temp_table>
>>>>
>>>> insert into/overwrite table <temp_table> <some select statement>
>>>>
>>>> known limitations
>>>>
>>>>    1.  when creating a temp table, it should already be created in the
>>>>    underlying datasource
>>>>    2. "insert overwrite table" deletes the existing table and creates
>>>>    it again
>>>>
>>>>
>>>> would be very grateful if you could use this connector in your current
>>>> JDBC use cases and provide us with feedback.
>>>>
>>>> best
>>>> --
>>>> *Niranda Perera*
>>>> Software Engineer, WSO2 Inc.
>>>> Mobile: +94-71-554-8430
>>>> Twitter: @n1r44 <https://twitter.com/N1R44>
>>>> https://pythagoreanscript.wordpress.com/
>>>>
>>>> _______________________________________________
>>>> Architecture mailing list
>>>> architect...@wso2.org
>>>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>>>
>>>>
>>>
>>>
>>> --
>>> W.G. Gihan Anuruddha
>>> Senior Software Engineer | WSO2, Inc.
>>> M: +94772272595
>>>
>>> _______________________________________________
>>> Dev mailing list
>>> Dev@wso2.org
>>> http://wso2.org/cgi-bin/mailman/listinfo/dev
>>>
>>>
>>
>>
>> --
>> *Niranda Perera*
>> Software Engineer, WSO2 Inc.
>> Mobile: +94-71-554-8430
>> Twitter: @n1r44 <https://twitter.com/N1R44>
>> https://pythagoreanscript.wordpress.com/
>>
>
>
>
> --
> W.G. Gihan Anuruddha
> Senior Software Engineer | WSO2, Inc.
> M: +94772272595
>
> _______________________________________________
> Dev mailing list
> Dev@wso2.org
> http://wso2.org/cgi-bin/mailman/listinfo/dev
>
>


-- 
Thanks & Regards,

Inosh Goonewardena
Associate Technical Lead- WSO2 Inc.
Mobile: +94779966317

_______________________________________________
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev

Re: [Dev] [Architecture] Carbon Spark JDBC connector

Reply via email to