[jira] [Comment Edited] (CALCITE-2040) Create adapter for Apache Arrow

2019-05-09 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16836162#comment-16836162
 ] 

Lai Zhou edited comment on CALCITE-2040 at 5/9/19 7:35 AM:
---

I think it may improve a lot of performance  if we have Arrow as a calling 
convention.

[~julianhyde],Do you mean a new kind of  Enumerable-implementations for Filter, 
Project, Aggregate and TableScan need to be introduced ?

I'm just getting familiar with Arrow.I'm glad to have a try on making Arrow as 
a calling convention in Calcite.


was (Author: hhlai1990):
I think it may improve a lot of performance  if we have Arrow as a calling 
convention.

[~julianhyde],Do you mean a new kind of  Enumerable-implementations for Filter, 
Project, Aggregate and TableScan need to be introduced ?

I'm just getting familiar with Arrow.I will have a try to make Arrow as a 
calling convention.

> Create adapter for Apache Arrow
> ---
>
> Key: CALCITE-2040
> URL: https://issues.apache.org/jira/browse/CALCITE-2040
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Priority: Major
>
> Create an adapter for [Apache Arrow|http://arrow.apache.org/]. This would 
> allow people to execute SQL statements, via JDBC or ODBC, on data stored in 
> Arrow in-memory format.
> Since Arrow is an in-memory format, it is not as straightforward as reading, 
> say, CSV files using the file adapter: an Arrow data set does not have a URL. 
> (Unless we use Arrow's 
> [Feather|https://blog.cloudera.com/blog/2016/03/feather-a-fast-on-disk-format-for-data-frames-for-r-and-python-powered-by-apache-arrow/]
>  format, or use an in-memory file system such as Alluxio.) So we would need 
> to devise a way of addressing Arrow data sets.
> Also, since Arrow is an extremely efficient format for processing data, it 
> would also be good to have Arrow as a calling convention. That is, 
> implementations of relational operators such as Filter, Project, Aggregate in 
> addition to just TableScan.
> Lastly, when we have an Arrow convention, if we build adapters for file 
> formats (for instance the bioinformatics formats SAM, VCF, FASTQ discussed in 
> CALCITE-2025) it would make a lot of sense to translate those formats 
> directly into Arrow (applying simple projects and filters first if 
> applicable). Those adapters would belong as a "contrib" module in the Arrow 
> project better than in Calcite.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2040) Create adapter for Apache Arrow

2019-05-09 Thread Lai Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16836162#comment-16836162
 ] 

Lai Zhou edited comment on CALCITE-2040 at 5/9/19 7:40 AM:
---

I think it may improve a lot of performance  if we have Arrow as a calling 
convention.

[~julianhyde],Do you mean a new kind of  Enumerable-implementations for Filter, 
Project, Aggregate and TableScan need to be introduced ?

I found someone did part of this on github.

See 
[https://github.com/masayuki038/calcite-arrow-sample/blob/master/src/main/scala/net/wrap_trap/calcite_arrow_sample/ArrowTranslatableTable.scala]

It may be a good start.

I'm just getting familiar with Arrow.I'm glad to have a try on making Arrow as 
a calling convention in Calcite.


was (Author: hhlai1990):
I think it may improve a lot of performance  if we have Arrow as a calling 
convention.

[~julianhyde],Do you mean a new kind of  Enumerable-implementations for Filter, 
Project, Aggregate and TableScan need to be introduced ?

I'm just getting familiar with Arrow.I'm glad to have a try on making Arrow as 
a calling convention in Calcite.

> Create adapter for Apache Arrow
> ---
>
> Key: CALCITE-2040
> URL: https://issues.apache.org/jira/browse/CALCITE-2040
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Priority: Major
>
> Create an adapter for [Apache Arrow|http://arrow.apache.org/]. This would 
> allow people to execute SQL statements, via JDBC or ODBC, on data stored in 
> Arrow in-memory format.
> Since Arrow is an in-memory format, it is not as straightforward as reading, 
> say, CSV files using the file adapter: an Arrow data set does not have a URL. 
> (Unless we use Arrow's 
> [Feather|https://blog.cloudera.com/blog/2016/03/feather-a-fast-on-disk-format-for-data-frames-for-r-and-python-powered-by-apache-arrow/]
>  format, or use an in-memory file system such as Alluxio.) So we would need 
> to devise a way of addressing Arrow data sets.
> Also, since Arrow is an extremely efficient format for processing data, it 
> would also be good to have Arrow as a calling convention. That is, 
> implementations of relational operators such as Filter, Project, Aggregate in 
> addition to just TableScan.
> Lastly, when we have an Arrow convention, if we build adapters for file 
> formats (for instance the bioinformatics formats SAM, VCF, FASTQ discussed in 
> CALCITE-2025) it would make a lot of sense to translate those formats 
> directly into Arrow (applying simple projects and filters first if 
> applicable). Those adapters would belong as a "contrib" module in the Arrow 
> project better than in Calcite.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2040) Create adapter for Apache Arrow

2019-05-09 Thread Masayuki Takahashi (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16836356#comment-16836356
 ] 

Masayuki Takahashi edited comment on CALCITE-2040 at 5/9/19 1:02 PM:
-

[calcite_arrow_sample|https://github.com/masayuki038/calcite-arrow-sample] is 
just a sample. This implementation use actually Enumerable implementation.

On the other hand, I try to implement Apache Arrow adaper. 
https://issues.apache.org/jira/browse/CALCITE-2173


was (Author: masayuki038):
[calcite_arrow_sample|https://github.com/masayuki038/calcite-arrow-sample] is a 
just sample. This implementation use actually Enumerable implementation.

On the other hand, I try to implement Apache Arrow adaper. 
https://issues.apache.org/jira/browse/CALCITE-2173

> Create adapter for Apache Arrow
> ---
>
> Key: CALCITE-2040
> URL: https://issues.apache.org/jira/browse/CALCITE-2040
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Priority: Major
>
> Create an adapter for [Apache Arrow|http://arrow.apache.org/]. This would 
> allow people to execute SQL statements, via JDBC or ODBC, on data stored in 
> Arrow in-memory format.
> Since Arrow is an in-memory format, it is not as straightforward as reading, 
> say, CSV files using the file adapter: an Arrow data set does not have a URL. 
> (Unless we use Arrow's 
> [Feather|https://blog.cloudera.com/blog/2016/03/feather-a-fast-on-disk-format-for-data-frames-for-r-and-python-powered-by-apache-arrow/]
>  format, or use an in-memory file system such as Alluxio.) So we would need 
> to devise a way of addressing Arrow data sets.
> Also, since Arrow is an extremely efficient format for processing data, it 
> would also be good to have Arrow as a calling convention. That is, 
> implementations of relational operators such as Filter, Project, Aggregate in 
> addition to just TableScan.
> Lastly, when we have an Arrow convention, if we build adapters for file 
> formats (for instance the bioinformatics formats SAM, VCF, FASTQ discussed in 
> CALCITE-2025) it would make a lot of sense to translate those formats 
> directly into Arrow (applying simple projects and filters first if 
> applicable). Those adapters would belong as a "contrib" module in the Arrow 
> project better than in Calcite.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CALCITE-2040) Create adapter for Apache Arrow

2021-04-09 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17318393#comment-17318393
 ] 

Julian Hyde edited comment on CALCITE-2040 at 4/10/21, 12:32 AM:
-

As [~mmior] [pointed out on 
dev@calcite|https://lists.apache.org/thread.html/r56003ae9392e9b759f46a5d94b7571a887a38712134753f7c9b33514%40%3Cdev.calcite.apache.org%3E],
 [PR 2133|https://github.com/apache/calcite/pull/2133] is ready for review.

I plan to fix it up so that it builds and runs in CI (except in AppVeyor, due 
to issues noted in [Arrow/Gandiva dependency management in 
Java|https://lists.apache.org/thread.html/r93a4fedb499c746917ab8d62cf5a8db8c93a7f24bc9fac81f90bedaa%40%3Cuser.arrow.apache.org%3E].

Assigning to myself, since I am reviewing and fixing up. My dev branch will be 
[julianhyde/2040-arrow|https://github.com/julianhyde/calcite/tree/2040-arrow].


was (Author: julianhyde):
As [~mmior] [pointed out on 
dev@calcite|https://lists.apache.org/thread.html/r56003ae9392e9b759f46a5d94b7571a887a38712134753f7c9b33514%40%3Cdev.calcite.apache.org%3E],
 [PR 2133|https://github.com/apache/calcite/pull/2133] is ready for review.

I plan to fix it up so that it builds and runs in CI (except in AppVeyor, due 
to issues noted in [Arrow/Gandiva dependency management in 
Java|https://lists.apache.org/thread.html/r93a4fedb499c746917ab8d62cf5a8db8c93a7f24bc9fac81f90bedaa%40%3Cuser.arrow.apache.org%3E].

> Create adapter for Apache Arrow
> ---
>
> Key: CALCITE-2040
> URL: https://issues.apache.org/jira/browse/CALCITE-2040
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Assignee: Michael Mior
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Create an adapter for [Apache Arrow|http://arrow.apache.org/]. This would 
> allow people to execute SQL statements, via JDBC or ODBC, on data stored in 
> Arrow in-memory format.
> Since Arrow is an in-memory format, it is not as straightforward as reading, 
> say, CSV files using the file adapter: an Arrow data set does not have a URL. 
> (Unless we use Arrow's 
> [Feather|https://blog.cloudera.com/blog/2016/03/feather-a-fast-on-disk-format-for-data-frames-for-r-and-python-powered-by-apache-arrow/]
>  format, or use an in-memory file system such as Alluxio.) So we would need 
> to devise a way of addressing Arrow data sets.
> Also, since Arrow is an extremely efficient format for processing data, it 
> would also be good to have Arrow as a calling convention. That is, 
> implementations of relational operators such as Filter, Project, Aggregate in 
> addition to just TableScan.
> Lastly, when we have an Arrow convention, if we build adapters for file 
> formats (for instance the bioinformatics formats SAM, VCF, FASTQ discussed in 
> CALCITE-2025) it would make a lot of sense to translate those formats 
> directly into Arrow (applying simple projects and filters first if 
> applicable). Those adapters would belong as a "contrib" module in the Arrow 
> project better than in Calcite.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (CALCITE-2040) Create adapter for Apache Arrow

2021-04-19 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325314#comment-17325314
 ] 

Julian Hyde edited comment on CALCITE-2040 at 4/19/21, 9:05 PM:


If anyone is willing and able to fix ARROW-11135, I would be grateful.

It seems that ARROW-11135 is not going to make it into Arrow 4.0 (they are just 
rolling out the first release candidate). No one seems interested in working on 
it. This case is blocked until that bug is fixed; when it is fixed and released 
in Maven central, we should be able to merge the fix to this bug in a couple of 
days.


was (Author: julianhyde):
It seems that ARROW-11135 is not going to make it into Arrow 4.0 (they are just 
rolling out the first release candidate). No one seems interested in working on 
it. This case is blocked until that bug is fixed.

If anyone is willing and able to fix ARROW-11135, I would be grateful.

> Create adapter for Apache Arrow
> ---
>
> Key: CALCITE-2040
> URL: https://issues.apache.org/jira/browse/CALCITE-2040
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Assignee: Julian Hyde
>Priority: Major
>  Labels: pull-request-available
> Attachments: arrow_data.py
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Create an adapter for [Apache Arrow|http://arrow.apache.org/]. This would 
> allow people to execute SQL statements, via JDBC or ODBC, on data stored in 
> Arrow in-memory format.
> Since Arrow is an in-memory format, it is not as straightforward as reading, 
> say, CSV files using the file adapter: an Arrow data set does not have a URL. 
> (Unless we use Arrow's 
> [Feather|https://blog.cloudera.com/blog/2016/03/feather-a-fast-on-disk-format-for-data-frames-for-r-and-python-powered-by-apache-arrow/]
>  format, or use an in-memory file system such as Alluxio.) So we would need 
> to devise a way of addressing Arrow data sets.
> Also, since Arrow is an extremely efficient format for processing data, it 
> would also be good to have Arrow as a calling convention. That is, 
> implementations of relational operators such as Filter, Project, Aggregate in 
> addition to just TableScan.
> Lastly, when we have an Arrow convention, if we build adapters for file 
> formats (for instance the bioinformatics formats SAM, VCF, FASTQ discussed in 
> CALCITE-2025) it would make a lot of sense to translate those formats 
> directly into Arrow (applying simple projects and filters first if 
> applicable). Those adapters would belong as a "contrib" module in the Arrow 
> project better than in Calcite.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (CALCITE-2040) Create adapter for Apache Arrow

2022-05-18 Thread Jonathan Swenson (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17538626#comment-17538626
 ] 

Jonathan Swenson edited comment on CALCITE-2040 at 5/18/22 7:28 AM:


For reference the linker error I get on an M1 mac when running the arrow tests 
is: 
{code:java}
FAILURE   2.3sec, org.apache.calcite.adapter.arrow.ArrowAdapterTest > 
testArrowProjectFieldsWithFloatFilter()
    java.lang.UnsatisfiedLinkError: 
/private/var/folders/fj/63_6n5dx10n4b5x7jtdj6tvhgn/T/libgandiva_jni.dylib804580c2-6fe4-4294-bdbb-c0c7d9e582a8:
 
dlopen(/private/var/folders/fj/63_6n5dx10n4b5x7jtdj6tvhgn/T/libgandiva_jni.dylib804580c2-6fe4-4294-bdbb-c0c7d9e582a8,
 0x0001): tried: 
'/private/var/folders/fj/63_6n5dx10n4b5x7jtdj6tvhgn/T/libgandiva_jni.dylib804580c2-6fe4-4294-bdbb-c0c7d9e582a8'
 (mach-o file, but is an incompatible architecture (have 'x86_64', need 
'arm64e'))
        at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1950)
        at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1832)
        at java.lang.Runtime.load0(Runtime.java:811)
        at java.lang.System.load(System.java:1088)
        at 
org.apache.arrow.gandiva.evaluator.JniLoader.loadGandivaLibraryFromJar(JniLoader.java:74)
        at 
org.apache.arrow.gandiva.evaluator.JniLoader.setupInstance(JniLoader.java:63)
        at 
org.apache.arrow.gandiva.evaluator.JniLoader.getInstance(JniLoader.java:53)
        at 
org.apache.arrow.gandiva.evaluator.JniLoader.getDefaultConfiguration(JniLoader.java:144)
        at org.apache.arrow.gandiva.evaluator.Filter.make(Filter.java:67) {code}
 


was (Author: jswenson):
For reference the linker error I get on an M1 mac is: 
{code:java}
FAILURE   2.3sec, org.apache.calcite.adapter.arrow.ArrowAdapterTest > 
testArrowProjectFieldsWithFloatFilter()
    java.lang.UnsatisfiedLinkError: 
/private/var/folders/fj/63_6n5dx10n4b5x7jtdj6tvhgn/T/libgandiva_jni.dylib804580c2-6fe4-4294-bdbb-c0c7d9e582a8:
 
dlopen(/private/var/folders/fj/63_6n5dx10n4b5x7jtdj6tvhgn/T/libgandiva_jni.dylib804580c2-6fe4-4294-bdbb-c0c7d9e582a8,
 0x0001): tried: 
'/private/var/folders/fj/63_6n5dx10n4b5x7jtdj6tvhgn/T/libgandiva_jni.dylib804580c2-6fe4-4294-bdbb-c0c7d9e582a8'
 (mach-o file, but is an incompatible architecture (have 'x86_64', need 
'arm64e'))
        at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1950)
        at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1832)
        at java.lang.Runtime.load0(Runtime.java:811)
        at java.lang.System.load(System.java:1088)
        at 
org.apache.arrow.gandiva.evaluator.JniLoader.loadGandivaLibraryFromJar(JniLoader.java:74)
        at 
org.apache.arrow.gandiva.evaluator.JniLoader.setupInstance(JniLoader.java:63)
        at 
org.apache.arrow.gandiva.evaluator.JniLoader.getInstance(JniLoader.java:53)
        at 
org.apache.arrow.gandiva.evaluator.JniLoader.getDefaultConfiguration(JniLoader.java:144)
        at org.apache.arrow.gandiva.evaluator.Filter.make(Filter.java:67) {code}
 

> Create adapter for Apache Arrow
> ---
>
> Key: CALCITE-2040
> URL: https://issues.apache.org/jira/browse/CALCITE-2040
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Assignee: Julian Hyde
>Priority: Major
>  Labels: pull-request-available
> Attachments: arrow_data.py
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Create an adapter for [Apache Arrow|http://arrow.apache.org/]. This would 
> allow people to execute SQL statements, via JDBC or ODBC, on data stored in 
> Arrow in-memory format.
> Since Arrow is an in-memory format, it is not as straightforward as reading, 
> say, CSV files using the file adapter: an Arrow data set does not have a URL. 
> (Unless we use Arrow's 
> [Feather|https://blog.cloudera.com/blog/2016/03/feather-a-fast-on-disk-format-for-data-frames-for-r-and-python-powered-by-apache-arrow/]
>  format, or use an in-memory file system such as Alluxio.) So we would need 
> to devise a way of addressing Arrow data sets.
> Also, since Arrow is an extremely efficient format for processing data, it 
> would also be good to have Arrow as a calling convention. That is, 
> implementations of relational operators such as Filter, Project, Aggregate in 
> addition to just TableScan.
> Lastly, when we have an Arrow convention, if we build adapters for file 
> formats (for instance the bioinformatics formats SAM, VCF, FASTQ discussed in 
> CALCITE-2025) it would make a lot of sense to translate those formats 
> directly into Arrow (applying simple projects and filters first if 
> applicable). Those adapters would belong as a "contrib" module in the Arrow 
> project better than in Calcite.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Comment Edited] (CALCITE-2040) Create adapter for Apache Arrow

2022-05-18 Thread Jonathan Swenson (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17538626#comment-17538626
 ] 

Jonathan Swenson edited comment on CALCITE-2040 at 5/18/22 5:46 PM:


For reference the linker error I get on an M1 mac when running the arrow tests 
is: 
{code:java}
FAILURE   2.3sec, org.apache.calcite.adapter.arrow.ArrowAdapterTest > 
testArrowProjectFieldsWithFloatFilter()
    java.lang.UnsatisfiedLinkError: 
/private/var/folders/fj/63_6n5dx10n4b5x7jtdj6tvhgn/T/libgandiva_jni.dylib804580c2-6fe4-4294-bdbb-c0c7d9e582a8:
 
dlopen(/private/var/folders/fj/63_6n5dx10n4b5x7jtdj6tvhgn/T/libgandiva_jni.dylib804580c2-6fe4-4294-bdbb-c0c7d9e582a8,
 0x0001): tried: 
'/private/var/folders/fj/63_6n5dx10n4b5x7jtdj6tvhgn/T/libgandiva_jni.dylib804580c2-6fe4-4294-bdbb-c0c7d9e582a8'
 (mach-o file, but is an incompatible architecture (have 'x86_64', need 
'arm64e'))
        at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1950)
        at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1832)
        at java.lang.Runtime.load0(Runtime.java:811)
        at java.lang.System.load(System.java:1088)
        at 
org.apache.arrow.gandiva.evaluator.JniLoader.loadGandivaLibraryFromJar(JniLoader.java:74)
        at 
org.apache.arrow.gandiva.evaluator.JniLoader.setupInstance(JniLoader.java:63)
        at 
org.apache.arrow.gandiva.evaluator.JniLoader.getInstance(JniLoader.java:53)
        at 
org.apache.arrow.gandiva.evaluator.JniLoader.getDefaultConfiguration(JniLoader.java:144)
        at org.apache.arrow.gandiva.evaluator.Filter.make(Filter.java:67) {code}
Filed https://issues.apache.org/jira/browse/ARROW-16608 to track this issue. 


was (Author: jswenson):
For reference the linker error I get on an M1 mac when running the arrow tests 
is: 
{code:java}
FAILURE   2.3sec, org.apache.calcite.adapter.arrow.ArrowAdapterTest > 
testArrowProjectFieldsWithFloatFilter()
    java.lang.UnsatisfiedLinkError: 
/private/var/folders/fj/63_6n5dx10n4b5x7jtdj6tvhgn/T/libgandiva_jni.dylib804580c2-6fe4-4294-bdbb-c0c7d9e582a8:
 
dlopen(/private/var/folders/fj/63_6n5dx10n4b5x7jtdj6tvhgn/T/libgandiva_jni.dylib804580c2-6fe4-4294-bdbb-c0c7d9e582a8,
 0x0001): tried: 
'/private/var/folders/fj/63_6n5dx10n4b5x7jtdj6tvhgn/T/libgandiva_jni.dylib804580c2-6fe4-4294-bdbb-c0c7d9e582a8'
 (mach-o file, but is an incompatible architecture (have 'x86_64', need 
'arm64e'))
        at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1950)
        at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1832)
        at java.lang.Runtime.load0(Runtime.java:811)
        at java.lang.System.load(System.java:1088)
        at 
org.apache.arrow.gandiva.evaluator.JniLoader.loadGandivaLibraryFromJar(JniLoader.java:74)
        at 
org.apache.arrow.gandiva.evaluator.JniLoader.setupInstance(JniLoader.java:63)
        at 
org.apache.arrow.gandiva.evaluator.JniLoader.getInstance(JniLoader.java:53)
        at 
org.apache.arrow.gandiva.evaluator.JniLoader.getDefaultConfiguration(JniLoader.java:144)
        at org.apache.arrow.gandiva.evaluator.Filter.make(Filter.java:67) {code}
 

> Create adapter for Apache Arrow
> ---
>
> Key: CALCITE-2040
> URL: https://issues.apache.org/jira/browse/CALCITE-2040
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Assignee: Julian Hyde
>Priority: Major
>  Labels: pull-request-available
> Attachments: arrow_data.py
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Create an adapter for [Apache Arrow|http://arrow.apache.org/]. This would 
> allow people to execute SQL statements, via JDBC or ODBC, on data stored in 
> Arrow in-memory format.
> Since Arrow is an in-memory format, it is not as straightforward as reading, 
> say, CSV files using the file adapter: an Arrow data set does not have a URL. 
> (Unless we use Arrow's 
> [Feather|https://blog.cloudera.com/blog/2016/03/feather-a-fast-on-disk-format-for-data-frames-for-r-and-python-powered-by-apache-arrow/]
>  format, or use an in-memory file system such as Alluxio.) So we would need 
> to devise a way of addressing Arrow data sets.
> Also, since Arrow is an extremely efficient format for processing data, it 
> would also be good to have Arrow as a calling convention. That is, 
> implementations of relational operators such as Filter, Project, Aggregate in 
> addition to just TableScan.
> Lastly, when we have an Arrow convention, if we build adapters for file 
> formats (for instance the bioinformatics formats SAM, VCF, FASTQ discussed in 
> CALCITE-2025) it would make a lot of sense to translate those formats 
> directly into Arrow (applying simple projects and filters first if 
> applicable). Those adapters would belong as a "contrib" module in the Arrow 
> project bet

[jira] [Comment Edited] (CALCITE-2040) Create adapter for Apache Arrow

2024-02-25 Thread Alessandro Solimando (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17820512#comment-17820512
 ] 

Alessandro Solimando edited comment on CALCITE-2040 at 2/25/24 7:33 PM:


[~hongyuguo], I have left a (partial) review, there is enough to be looked at 
already I feel, I will finish the review sometime next week.

Since you are the one currently working on it, it might make sense to assign 
the ticket to yourself and mark it as "in progress", wdyt?


was (Author: asolimando):
[~hongyuguo], I have left a (partial) review, there is enough to be looked at 
already I feel, I will finish the review sometime next week.

> Create adapter for Apache Arrow
> ---
>
> Key: CALCITE-2040
> URL: https://issues.apache.org/jira/browse/CALCITE-2040
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Assignee: Julian Hyde
>Priority: Major
>  Labels: pull-request-available
> Attachments: arrow_data.py
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Create an adapter for [Apache Arrow|http://arrow.apache.org/]. This would 
> allow people to execute SQL statements, via JDBC or ODBC, on data stored in 
> Arrow in-memory format.
> Since Arrow is an in-memory format, it is not as straightforward as reading, 
> say, CSV files using the file adapter: an Arrow data set does not have a URL. 
> (Unless we use Arrow's 
> [Feather|https://blog.cloudera.com/blog/2016/03/feather-a-fast-on-disk-format-for-data-frames-for-r-and-python-powered-by-apache-arrow/]
>  format, or use an in-memory file system such as Alluxio.) So we would need 
> to devise a way of addressing Arrow data sets.
> Also, since Arrow is an extremely efficient format for processing data, it 
> would also be good to have Arrow as a calling convention. That is, 
> implementations of relational operators such as Filter, Project, Aggregate in 
> addition to just TableScan.
> Lastly, when we have an Arrow convention, if we build adapters for file 
> formats (for instance the bioinformatics formats SAM, VCF, FASTQ discussed in 
> CALCITE-2025) it would make a lot of sense to translate those formats 
> directly into Arrow (applying simple projects and filters first if 
> applicable). Those adapters would belong as a "contrib" module in the Arrow 
> project better than in Calcite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (CALCITE-2040) Create adapter for Apache Arrow

2024-03-10 Thread hongyu guo (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-2040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17825134#comment-17825134
 ] 

hongyu guo edited comment on CALCITE-2040 at 3/11/24 2:47 AM:
--

Fixed in 
[d4e8830|https://github.com/apache/calcite/commit/d4e88302e367b7f5a3b3da9d2e0f734320cef413].

Thanks all contributors!


was (Author: JIRAUSER300840):
Fixed in 
[d4e8830|[https://github.com/apache/calcite/commit/d4e88302e367b7f5a3b3da9d2e0f734320cef413].]

 Thanks all contributors!

> Create adapter for Apache Arrow
> ---
>
> Key: CALCITE-2040
> URL: https://issues.apache.org/jira/browse/CALCITE-2040
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Assignee: hongyu guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.37.0
>
> Attachments: arrow_data.py
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Create an adapter for [Apache Arrow|http://arrow.apache.org/]. This would 
> allow people to execute SQL statements, via JDBC or ODBC, on data stored in 
> Arrow in-memory format.
> Since Arrow is an in-memory format, it is not as straightforward as reading, 
> say, CSV files using the file adapter: an Arrow data set does not have a URL. 
> (Unless we use Arrow's 
> [Feather|https://blog.cloudera.com/blog/2016/03/feather-a-fast-on-disk-format-for-data-frames-for-r-and-python-powered-by-apache-arrow/]
>  format, or use an in-memory file system such as Alluxio.) So we would need 
> to devise a way of addressing Arrow data sets.
> Also, since Arrow is an extremely efficient format for processing data, it 
> would also be good to have Arrow as a calling convention. That is, 
> implementations of relational operators such as Filter, Project, Aggregate in 
> addition to just TableScan.
> Lastly, when we have an Arrow convention, if we build adapters for file 
> formats (for instance the bioinformatics formats SAM, VCF, FASTQ discussed in 
> CALCITE-2025) it would make a lot of sense to translate those formats 
> directly into Arrow (applying simple projects and filters first if 
> applicable). Those adapters would belong as a "contrib" module in the Arrow 
> project better than in Calcite.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)