Re: Datasource v2 Select Into support

2018-09-19 Thread Ryan Blue
Ross,

The problem you're hitting is that there aren't many logical plans that
work with the v2 source API yet. Here, you're creating an InsertIntoTable
logical plan from SQL, which can't be converted to a physical plan because
there is no rule to convert it either to the right logical plan for v2,
AppendData.

What we need to do next is to get the new logical plans for v2 into Spark,
like CreateTableAsSelect and ReplacePartitionsDynamic, and then add
analysis rules to convert from the plans created by the SQL parser to those
v2 plans. The reason why we're adding new logical plans is to clearly
define the behavior of these queries, including the rules that validate
data can be written and how the operation is implemented, like creating a
table before writing to it for CTAS.

I currently have a working implementation of this, but we're blocked
getting it into upstream Spark on the issue to add the TableCatalog API.
I'd love to get that in so we can get more of our implementation submitted.

rb

On Thu, Sep 6, 2018 at 6:54 AM Wenchen Fan  wrote:

> Data source v2 catalog support(table/view) is still in progress. There are
> several threads in the dev list discussing it, please join the discussion
> if you are interested. Thanks for trying!
>
> On Thu, Sep 6, 2018 at 7:23 PM Ross Lawley  wrote:
>
>> Hi,
>>
>> I hope this is the correct mailinglist. I've been adding v2 support to
>> the MongoDB Spark connector using Spark 2.3.1.  I've noticed one of my
>> tests pass when using the original DefaultSource but errors with my v2
>> implementation:
>>
>> The code I'm running is:
>> val df = spark.loadDS[Character]()
>> df.createOrReplaceTempView("people")
>> spark.sql("INSERT INTO table people SELECT 'Mort', 1000")
>>
>> The error I see is:
>> unresolved operator 'InsertIntoTable DataSourceV2Relation [name#0,
>> age#1], MongoDataSourceReader ...
>> 'InsertIntoTable DataSourceV2Relation [name#0, age#1],
>> MongoDataSourceReader 
>> +- Project [Mort AS Mort#7, 1000 AS 1000#8]
>>+- OneRowRelation
>>
>> My DefaultSource V2 implementation extends DataSourceV2 with ReadSupport
>> with ReadSupportWithSchema with WriteSupport
>>
>> I'm wondering if there is something I'm not implementing, or if there is
>> a bug in my implementation or its an issue with Spark?
>>
>> Any pointers would be great,
>>
>> Ross
>>
>

-- 
Ryan Blue
Software Engineer
Netflix


Re: Datasource v2 Select Into support

2018-09-06 Thread Wenchen Fan
Data source v2 catalog support(table/view) is still in progress. There are
several threads in the dev list discussing it, please join the discussion
if you are interested. Thanks for trying!

On Thu, Sep 6, 2018 at 7:23 PM Ross Lawley  wrote:

> Hi,
>
> I hope this is the correct mailinglist. I've been adding v2 support to the
> MongoDB Spark connector using Spark 2.3.1.  I've noticed one of my tests
> pass when using the original DefaultSource but errors with my v2
> implementation:
>
> The code I'm running is:
> val df = spark.loadDS[Character]()
> df.createOrReplaceTempView("people")
> spark.sql("INSERT INTO table people SELECT 'Mort', 1000")
>
> The error I see is:
> unresolved operator 'InsertIntoTable DataSourceV2Relation [name#0, age#1],
> MongoDataSourceReader ...
> 'InsertIntoTable DataSourceV2Relation [name#0, age#1],
> MongoDataSourceReader 
> +- Project [Mort AS Mort#7, 1000 AS 1000#8]
>+- OneRowRelation
>
> My DefaultSource V2 implementation extends DataSourceV2 with ReadSupport
> with ReadSupportWithSchema with WriteSupport
>
> I'm wondering if there is something I'm not implementing, or if there is a
> bug in my implementation or its an issue with Spark?
>
> Any pointers would be great,
>
> Ross
>


Datasource v2 Select Into support

2018-09-06 Thread Ross Lawley
Hi,

I hope this is the correct mailinglist. I've been adding v2 support to the
MongoDB Spark connector using Spark 2.3.1.  I've noticed one of my tests
pass when using the original DefaultSource but errors with my v2
implementation:

The code I'm running is:
val df = spark.loadDS[Character]()
df.createOrReplaceTempView("people")
spark.sql("INSERT INTO table people SELECT 'Mort', 1000")

The error I see is:
unresolved operator 'InsertIntoTable DataSourceV2Relation [name#0, age#1],
MongoDataSourceReader ...
'InsertIntoTable DataSourceV2Relation [name#0, age#1],
MongoDataSourceReader 
+- Project [Mort AS Mort#7, 1000 AS 1000#8]
   +- OneRowRelation

My DefaultSource V2 implementation extends DataSourceV2 with ReadSupport
with ReadSupportWithSchema with WriteSupport

I'm wondering if there is something I'm not implementing, or if there is a
bug in my implementation or its an issue with Spark?

Any pointers would be great,

Ross