[jira] [Created] (ARROW-10764) [Rust] Inline small JSON and CSV test files

2020-11-28 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-10764:
--

 Summary: [Rust] Inline small JSON and CSV test files
 Key: ARROW-10764
 URL: https://issues.apache.org/jira/browse/ARROW-10764
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Neville Dipale


Some of our tests use small CSV and JSON files, which we could inline in the 
code, instead of adding more files to test data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10763) [Rust] Speed up take kernels

2020-11-28 Thread Jira
Daniël Heres created ARROW-10763:


 Summary: [Rust] Speed up take kernels
 Key: ARROW-10763
 URL: https://issues.apache.org/jira/browse/ARROW-10763
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Daniël Heres


Speed up take kernels for non-null data



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10762) Configuration does not provide a mapping for array column

2020-11-28 Thread Benjamin Du (Jira)
Benjamin Du created ARROW-10762:
---

 Summary: Configuration does not provide a mapping for array column
 Key: ARROW-10762
 URL: https://issues.apache.org/jira/browse/ARROW-10762
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Benjamin Du


I tried to leverage `org.apache.arrow.adapter.jdbc.JdbcToArrow.sqlToArrow` to 
query a Hive table but got the following error message on array columns. 

{{Configuration does not provide a mapping for array column 234}}

The error message makes wondering whether it is possible to provided a 
customized configuration for array columns?

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10761) [Rust] [DataFusion] Add SQL support for referencing fields in structs

2020-11-28 Thread Andy Grove (Jira)
Andy Grove created ARROW-10761:
--

 Summary: [Rust] [DataFusion] Add SQL support for referencing 
fields in structs
 Key: ARROW-10761
 URL: https://issues.apache.org/jira/browse/ARROW-10761
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Rust - DataFusion
Reporter: Andy Grove






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10760) [Rust] [DataFusion] Predicate push down does not support joins correctly

2020-11-28 Thread Andy Grove (Jira)
Andy Grove created ARROW-10760:
--

 Summary: [Rust] [DataFusion] Predicate push down does not support 
joins correctly
 Key: ARROW-10760
 URL: https://issues.apache.org/jira/browse/ARROW-10760
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust - DataFusion
Reporter: Andy Grove
 Fix For: 3.0.0


 
{code:java}
 equijoin_implicit_syntax_with_filter stdout 
thread 'equijoin_implicit_syntax_with_filter' panicked at 'Creating physical 
plan for 'SELECT t1_id, t1_name, t2_name FROM t1, t2 WHERE t1_id > 0 AND t1_id 
= t2_id AND t2_id < 99 ORDER BY t1_id': Sort: #t1_id ASC NULLS FIRST
  Projection: #t1_id, #t1_name, #t2_name
Join: t1_id = t2_id
  Filter: #t1_id Gt Int64(0) And #t2_id Lt Int64(99)
TableScan: t1 projection=Some([0, 1])
  Filter: #t1_id Gt Int64(0) And #t2_id Lt Int64(99)
TableScan: t2 projection=Some([0, 1]): 
ArrowError(InvalidArgumentError("Unable to get field named \"t2_id\". Valid 
fields: [\"t1_id\", \"t1_name\"]"))', datafusion/tests/sql.rs:1262:48
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10759) [Rust][DataFusion] Implement support for casting string to date in sql expressions

2020-11-28 Thread Yordan Pavlov (Jira)
Yordan Pavlov created ARROW-10759:
-

 Summary: [Rust][DataFusion] Implement support for casting string 
to date in sql expressions
 Key: ARROW-10759
 URL: https://issues.apache.org/jira/browse/ARROW-10759
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust - DataFusion
Affects Versions: 2.0.0
Reporter: Yordan Pavlov


If DataFusion had support for creating date literals using a cast expression 
such as 
CAST('2019-01-01' AS DATE) this would allow direct (and therefore more 
efficient) comparison of date columns to scalar values (compared to 
representing dates as strings and then resorting to string comparison).
I already have a basic implementation that works, just have to add some more 
tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10758) Arrow Dataset Loading CSV format file from S3

2020-11-28 Thread Lynch (Jira)
Lynch created ARROW-10758:
-

 Summary: Arrow Dataset Loading CSV format file from S3
 Key: ARROW-10758
 URL: https://issues.apache.org/jira/browse/ARROW-10758
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 2.0.0
Reporter: Lynch


I am using `S3FileSystem` along with `CsvFileFormat` in Arrow dataset to load 
all csv files under a S3 bucket. 

Main test code is as below:

 
{code:java}
auto format = std::make_shared();
string output_path;
auto s3_file_system = arrow::fs::FileSystemFromUri("s3://test-csv-bucket", 
_path).ValueOrDie();

FileSystemFactoryOptions options;
options.partition_base_dir = output_path;

arrow::fs::FileSelector _file_selector;

ASSERT_OK_AND_ASSIGN(auto factory,
 FileSystemDatasetFactory::Make(s3_file_system, 
_file_selector, format, options));

ASSERT_OK_AND_ASSIGN(auto schema, factory->Inspect());

ASSERT_OK_AND_ASSIGN(auto dataset, factory->Finish(schema));

{code}
But it seems when calling `ASSERT_OK_AND_ASSIGN(auto schema, 
factory->Inspect());` it throws exception when reading file from S3 bucket and 
the exception stack is as follows:

 

 
{code:java}
__pthread_kill 0x7fff70dc033a
pthread_kill 0x7fff70e7ce60
abort 0x7fff70d47808
malloc_vreport 0x7fff70e3d50b
malloc_report 0x7fff70e4040f
Aws::Free(void*) AWSMemory.cpp:97
std::__1::enable_if > >::value, void>::type 
Aws::Delete > 
>(std::__1::basic_iostream >*) AWSMemory.h:119
Aws::Utils::Stream::ResponseStream::ReleaseStream() ResponseStream.cpp:62
Aws::Utils::Stream::ResponseStream::~ResponseStream() ResponseStream.cpp:54
Aws::Utils::Stream::ResponseStream::~ResponseStream() ResponseStream.cpp:53
Aws::S3::Model::GetObjectResult::~GetObjectResult() GetObjectResult.h:30
Aws::S3::Model::GetObjectResult::~GetObjectResult() GetObjectResult.h:30
arrow::fs::(anonymous namespace)::ObjectInputFile::ReadAt(long long, long long, 
void*) s3fs.cc:724
arrow::fs::(anonymous namespace)::ObjectInputFile::ReadAt(long long, long long) 
s3fs.cc:735
arrow::dataset::OpenReader(arrow::dataset::FileSource const&, 
arrow::dataset::CsvFileFormat const&, 
std::__1::shared_ptr const&, arrow::MemoryPool*) 
file_csv.cc:119
arrow::dataset::CsvFileFormat::Inspect(arrow::dataset::FileSource const&) const 
file_csv.cc:182
arrow::dataset::FileSystemDatasetFactory::InspectSchemas(arrow::dataset::InspectOptions)
 discovery.cc:219
arrow::dataset::DatasetFactory::Inspect(arrow::dataset::InspectOptions) 
discovery.cc:41
{code}
 

Does Arrow dataset support reading csv/parquest/ipc from S3Filesystem?

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10757) [Rust] [CI] Sporadic failures due to disk filling up

2020-11-28 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-10757:
--

 Summary: [Rust] [CI] Sporadic failures due to disk filling up
 Key: ARROW-10757
 URL: https://issues.apache.org/jira/browse/ARROW-10757
 Project: Apache Arrow
  Issue Type: Bug
  Components: CI, Rust
Reporter: Neville Dipale
Assignee: Neville Dipale


CI is failing due to disk size filling up, affecting almost all Rust PRs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10756) Fix reduntant clone

2020-11-28 Thread Jira
Daniël Heres created ARROW-10756:


 Summary: Fix reduntant clone
 Key: ARROW-10756
 URL: https://issues.apache.org/jira/browse/ARROW-10756
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Daniël Heres


Was introduced in PR that was merged around the same time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)