Re: Julia implementation and integration with main apache arrow repository
Hi Jacob, I'm very excited to see Julia's implementation of Arrow is restarted. Pkg.jl seems now support treating packages in subdirectories. I guess the feature is added by https://github.com/JuliaLang/Pkg.jl/pull/1766 and https://github.com/JuliaRegistries/RegistryTools.jl/pull/31. As these pull request, you can tell the location of Julia package directory to Pkg.jl by `subdir` parameter in Project.toml. 2020年9月14日(月) 4:33 Jacob Quinn : > > Hello all, > > Hopefully this email works (I'm not super familiar with using mailing lists > like this). > > Over the past few weeks, I've been working on a pure Julia implementation > to support serializing/deserializing the arrow format for Julia. The code > in its current state can be found here: > https://github.com/JuliaData/Arrow.jl. > > I believe the code has reached an initial beta-level quality and just > finished writing the arrow <-> json integration testing code that archery > expects. I haven't worked on actual archery integration yet, but it should > just be a matter of adding a tester_julia.py file that knows how to invoke > the test/integrationtest.jl file with similar arguments as the tester_go.py > file. > > This email has a couple purposes: > * Signal that the julia code is somewhat ready to be used/integrated in the > main repo > * Ask for advice/direction on actually integrating with the apache arrow > github repository > > For the latter, in particular, I imagine keeping an initial PR as minimal > as possible is desirable. I need to follow up with the core pkg devs for > Julia, but I've been told it's possible/not hard to have a Julia package > "live" inside a monorepo, but I just haven't figured out the details of > what that means on the Julia General package registry side of things. But > I'm happy to figure that out and shouldn't really affect the merging of > Julia code into the apache arrow github. > > So my plan is roughly: > * Fork/make a branch of the apache arrow repo > * Add in the Julia code from the link I mentioned above > * Add necessary files/integration in archery to run Julia integration tests > alongside other languages > * Do initial merge into apache arrow? > > If there are other initial requirements core devs would expect, just let me > know, but I imagine that updating the implementation matrix, for example, > can be done afterwards as follow up. > > Excited to have Julia more officially integrated here! > > Cheers, > > -Jacob > https://github.com/quinnj > https://twitter.com/quinn_jacobd -- Regards, Kenta Murata
Re: [DISCUSS][C++] Group by operation for RecordBatch and Table
Hi Wes, Thank you very much giving us the detail explanation of your thoughts. I need the knowledge of the SOTA of query engine you pointed out if I’ll contribute to C++ Query Engine or just write the binding of it. I’m studying the article and the codes. Regards, Kenta Murata On Thu, Aug 6, 2020 at 4:17 Wes McKinney wrote: > I see there's a bunch of additional aggregation code in Dremio that > might serve as inspiration (some of which is related to distributed > aggregation, so may not be relevant) > > > https://github.com/dremio/dremio-oss/tree/master/sabot/kernel/src/main/java/com/dremio/sabot/op/aggregate > > Maybe Andy or one of the other active Rust DataFusion developers can > comment on the approach taken for hash aggs there > > On Wed, Aug 5, 2020 at 1:52 PM Wes McKinney wrote: > > > > hi Kenta, > > > > Yes, I think it only makes sense to implement this in the context of > > the query engine project. Here's a list of assorted thoughts about it: > > > > * I have been mentally planning to follow the Vectorwise-type query > > engine architecture that's discussed in [1] [2] and many other > > academic papers. I believe this is how some other current generation > > open source columnar query engines work, such as Dremio [3] and DuckDB > > [4][5]. > > * Hash (aka "group") aggregations need to be able to process arbitrary > > expressions, not only a plain input column. So it's not enough to be > > able to compute "sum(x) group by y" where "x" and "y" are fields in a > > RecordBatch, we need to be able to compute "$AGG_FUNC($EXPR) GROUP BY > > $GROUP_EXPR_1, $GROUP_EXPR_2, ..." where $EXPR / $GROUP_EXPR_1 / ... > > are any column expressions computing from the input relations (keep in > > mind that an aggregation could apply to stream of record batches > > produced by a join). In any case, expression evaluation is a > > closely-related task and should be implemented ASAP. > > * Hash aggregation functions themselves should probably be introduced > > as a new Function type in arrow::compute. I don't think it would be > > appropriate to use the existing "SCALAR_AGGREGATE" functions, instead > > we should introduce a new HASH_AGGREGATE function type that accepts > > input data to be aggregated along with an array of pre-computed bucket > > ids (which are computed by probing the HT). So rather than > > Update(state, args) like we have for scalar aggregate, the primary > > interface for group aggregation is Update(state, bucket_ids, args) > > * The HashAggregation operator should be able to process an arbitrary > > iterator of record batches > > * We will probably want to adapt an existing or implement a new > > concurrent hash table so that aggregations can be performed in > > parallel without requiring a post-aggregation merge step > > * There's some general support machinery for hashing multiple fields > > and then doing efficient vectorized hash table probes (to assign > > aggregation bucket id's to each row position) > > > > I think it is worth investing the effort to build something that is > > reasonably consistent with the "state of the art" in database systems > > (at least according to what we are able to build with our current > > resources) rather than building something more crude that has to be > > replaced with new implementation later. > > > > I'd like to help personally with this work (particularly since the > > natural next step with my recent work in arrow/compute is to implement > > expression evaluation) but I won't have significant bandwidth for it > > until later this month or early September. If someone feels that they > > sufficiently understand the state of the art for this type of workload > > and wants to help with laying down the abstract C++ APIs for > > Volcano-style query execution and an implementation of hash > > aggregation, that sounds great. > > > > Thanks, > > Wes > > > > [1]: https://www.vldb.org/pvldb/vol11/p2209-kersten.pdf > > [2]: https://github.com/TimoKersten/db-engine-paradigms > > [3]: > https://github.com/dremio/dremio-oss/tree/master/sabot/kernel/src/main/java/com/dremio/sabot/op/aggregate/hash > > [4]: > https://github.com/cwida/duckdb/blob/master/src/include/duckdb/execution/aggregate_hashtable.hpp > > [5]: > https://github.com/cwida/duckdb/blob/master/src/execution/aggregate_hashtable.cpp > > > > On Wed, Aug 5, 2020 at 10:23 AM Kenta Murata wrote: > > > > > > Hi folks, > > > > > > Red Arrow, the Ruby binding of Arrow GLib, implements group
[DISCUSS][C++] Group by operation for RecordBatch and Table
Hi folks, Red Arrow, the Ruby binding of Arrow GLib, implements grouped aggregation features for RecordBatch and Table. Because these features are written in Ruby, they are too slow for large size data. We need to make them much faster. To improve their calculation speed, they should be written in C++, and should be put in Arrow C++ instead of Red Arrow. Is anyone working on implementing group-by operation for RecordBatch and Table in Arrow C++? If no one has worked on it, I would like to try it. By the way, I found that the grouped aggregation feature is mentioned in the design document of Arrow C++ Query Engine. Is Query Engine, not Arrow C++ Core, a suitable location to implement group-by operation?
Re: [DISCUSS][C++] MakeBuilder with a DictionaryType ignores the bit-width of the index type
Agreed. I made ARROW-9642 and its pull-request. https://github.com/apache/arrow/pull/7898 2020年8月4日(火) 6:32 Wes McKinney : > > It seems useful to use the index type to set the starting bit width of > the builder. I guess we can preserve the behavior of expanding to the > next bit width when overflowing the smaller integer types. > > On Sun, Aug 2, 2020 at 9:32 PM Kenta Murata wrote: > > > > Hi folks, > > > > arrow::MakeBuilder function with a dictionary type creates a > > dictionary builder with AdaptiveIntBuilder by ignoring the bit-width > > of DictionaryType's index type. > > I want to know whether this behavior is intentional or not. > > > > I think this feature is useful when I want to use a dictionary builder > > with AdaptiveIntBuilder. > > But the result by following code is a little bit surprising. > > > > ```cpp > > #include > > #include > > #include > > > > int > > main(int argc, char **argv) > > { > > auto dict_type = arrow::dictionary(arrow::int32(), arrow::utf8()); > > std::unique_ptr out; > > ARROW_CHECK_OK(arrow::MakeBuilder(arrow::default_memory_pool(), > > dict_type, )); > > std::cout << "type: " << out->type()->ToString() << std::endl; > > return 0; > > } > > ``` > > > > You can see the message below when executing this code. > > > > type: dictionary > > > > I got `indices=int8` from a dictionary type with int32 index type. > > I guess most people expect they get `indices=int32` here. > > > > -- > > Kenta Murata -- Regards, Kenta Murata
[DISCUSS][C++] MakeBuilder with a DictionaryType ignores the bit-width of the index type
Hi folks, arrow::MakeBuilder function with a dictionary type creates a dictionary builder with AdaptiveIntBuilder by ignoring the bit-width of DictionaryType's index type. I want to know whether this behavior is intentional or not. I think this feature is useful when I want to use a dictionary builder with AdaptiveIntBuilder. But the result by following code is a little bit surprising. ```cpp #include #include #include int main(int argc, char **argv) { auto dict_type = arrow::dictionary(arrow::int32(), arrow::utf8()); std::unique_ptr out; ARROW_CHECK_OK(arrow::MakeBuilder(arrow::default_memory_pool(), dict_type, )); std::cout << "type: " << out->type()->ToString() << std::endl; return 0; } ``` You can see the message below when executing this code. type: dictionary I got `indices=int8` from a dictionary type with int32 index type. I guess most people expect they get `indices=int32` here. -- Kenta Murata
Re: 0.17 release blog post: help needed
I've edited Ruby and C GLib parts. Kou and Shiro will check them later. 2020年4月20日(月) 11:09 Wes McKinney : > > I made a pass through the changelog and added a bunch of TODOs related > to C++. In general, as a reminder, in these blog posts since the > releases are growing large we should try to present as compact a high > level summary as possible to convey some of the highlights of our > labors (so likely not needed to write out any JIRA numbers, people can > look at the changelog for that). I'll spend some more time on the blog > post after others have had a chance to take a pass through > > On Sat, Apr 18, 2020 at 12:13 PM Neal Richardson > wrote: > > > > Hi all, > > Since it looks like we're close to releasing 0.17, we need to fill in the > > details for our blog post announcement. I've started a document here: > > https://docs.google.com/document/d/16UKZtvL49o8nCDN8JU3Ut6y76Y9d8-4qXv5vFv7aNvs/edit#heading=h.kqqacbm2lpv8 > > > > Please fill in the details for the parts of the project you're close to. > > I'll handle wrapping this up in the usual boilerplate when we're done. > > > > Thanks, > > Neal -- Regards, Kenta Murata
[jira] [Created] (ARROW-8343) [GLib] Add GArrowRecordBatchIterator
Kenta Murata created ARROW-8343: --- Summary: [GLib] Add GArrowRecordBatchIterator Key: ARROW-8343 URL: https://issues.apache.org/jira/browse/ARROW-8343 Project: Apache Arrow Issue Type: New Feature Components: GLib Reporter: Kenta Murata Assignee: Kenta Murata -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8073) [GLib] Add binding of arrow::fs::PathForest
Kenta Murata created ARROW-8073: --- Summary: [GLib] Add binding of arrow::fs::PathForest Key: ARROW-8073 URL: https://issues.apache.org/jira/browse/ARROW-8073 Project: Apache Arrow Issue Type: New Feature Components: GLib Reporter: Kenta Murata Assignee: Kenta Murata -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7739) [GLib] Use placement new to initialize shared_ptr object in private structs
Kenta Murata created ARROW-7739: --- Summary: [GLib] Use placement new to initialize shared_ptr object in private structs Key: ARROW-7739 URL: https://issues.apache.org/jira/browse/ARROW-7739 Project: Apache Arrow Issue Type: Task Components: GLib Reporter: Kenta Murata -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7730) [GLib] Add Duration type support
Kenta Murata created ARROW-7730: --- Summary: [GLib] Add Duration type support Key: ARROW-7730 URL: https://issues.apache.org/jira/browse/ARROW-7730 Project: Apache Arrow Issue Type: New Feature Components: GLib Reporter: Kenta Murata -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7698) [Format][C++] Add tensor and sparse tensor supports in File metadata
Kenta Murata created ARROW-7698: --- Summary: [Format][C++] Add tensor and sparse tensor supports in File metadata Key: ARROW-7698 URL: https://issues.apache.org/jira/browse/ARROW-7698 Project: Apache Arrow Issue Type: New Feature Components: C++, Format Reporter: Kenta Murata -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7515) [C++] Rename nonexistent and non_existent to not_found
Kenta Murata created ARROW-7515: --- Summary: [C++] Rename nonexistent and non_existent to not_found Key: ARROW-7515 URL: https://issues.apache.org/jira/browse/ARROW-7515 Project: Apache Arrow Issue Type: Task Components: C++ Reporter: Kenta Murata Assignee: Kenta Murata -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7504) [GLib] Introduce value-returning garrow::check
Kenta Murata created ARROW-7504: --- Summary: [GLib] Introduce value-returning garrow::check Key: ARROW-7504 URL: https://issues.apache.org/jira/browse/ARROW-7504 Project: Apache Arrow Issue Type: Improvement Components: GLib Reporter: Kenta Murata Assignee: Kenta Murata Follow this discussion https://github.com/apache/arrow/pull/6066/files#r363367450 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7445) [GLib] Add HadoopFileSystem support
Kenta Murata created ARROW-7445: --- Summary: [GLib] Add HadoopFileSystem support Key: ARROW-7445 URL: https://issues.apache.org/jira/browse/ARROW-7445 Project: Apache Arrow Issue Type: Sub-task Components: GLib Reporter: Kenta Murata -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7444) [GLib] Add LocalFileSystem support
Kenta Murata created ARROW-7444: --- Summary: [GLib] Add LocalFileSystem support Key: ARROW-7444 URL: https://issues.apache.org/jira/browse/ARROW-7444 Project: Apache Arrow Issue Type: Sub-task Components: GLib Reporter: Kenta Murata -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7443) [GLib] Add binding of arrow::fs
Kenta Murata created ARROW-7443: --- Summary: [GLib] Add binding of arrow::fs Key: ARROW-7443 URL: https://issues.apache.org/jira/browse/ARROW-7443 Project: Apache Arrow Issue Type: New Feature Components: GLib Reporter: Kenta Murata -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7421) [C++] Support creating SparseCSRMatrix and SparseCSCMatrix from 0d and 1d Tensors
Kenta Murata created ARROW-7421: --- Summary: [C++] Support creating SparseCSRMatrix and SparseCSCMatrix from 0d and 1d Tensors Key: ARROW-7421 URL: https://issues.apache.org/jira/browse/ARROW-7421 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Kenta Murata Assignee: Kenta Murata -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7420) [C++] Migrate internal functions of SparseTensor to Result-returning version
Kenta Murata created ARROW-7420: --- Summary: [C++] Migrate internal functions of SparseTensor to Result-returning version Key: ARROW-7420 URL: https://issues.apache.org/jira/browse/ARROW-7420 Project: Apache Arrow Issue Type: Task Components: C++ Reporter: Kenta Murata Assignee: Kenta Murata -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7419) [Python] Support SparseCSCMatrix
Kenta Murata created ARROW-7419: --- Summary: [Python] Support SparseCSCMatrix Key: ARROW-7419 URL: https://issues.apache.org/jira/browse/ARROW-7419 Project: Apache Arrow Issue Type: New Feature Components: Python Reporter: Kenta Murata -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7371) [GLib] Add Datasets binding
Kenta Murata created ARROW-7371: --- Summary: [GLib] Add Datasets binding Key: ARROW-7371 URL: https://issues.apache.org/jira/browse/ARROW-7371 Project: Apache Arrow Issue Type: New Feature Components: GLib Reporter: Kenta Murata Assignee: Kenta Murata -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7369) [GLib] Add garrow_table_combine_chunks
Kenta Murata created ARROW-7369: --- Summary: [GLib] Add garrow_table_combine_chunks Key: ARROW-7369 URL: https://issues.apache.org/jira/browse/ARROW-7369 Project: Apache Arrow Issue Type: New Feature Components: GLib Reporter: Kenta Murata Assignee: Kenta Murata -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7306) [C++] Add Result-returning version of FileSystemFromUri
Kenta Murata created ARROW-7306: --- Summary: [C++] Add Result-returning version of FileSystemFromUri Key: ARROW-7306 URL: https://issues.apache.org/jira/browse/ARROW-7306 Project: Apache Arrow Issue Type: Task Components: C++ Reporter: Kenta Murata Assignee: Kenta Murata -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7297) [C++] Add value accessor in sparse tensor class
Kenta Murata created ARROW-7297: --- Summary: [C++] Add value accessor in sparse tensor class Key: ARROW-7297 URL: https://issues.apache.org/jira/browse/ARROW-7297 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Kenta Murata {{SparseTensor}} can have value accessor like {{Tensor::Value}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7291) [Dev]Fix FORMAT_DIR in update-flatbuffers.sh
Kenta Murata created ARROW-7291: --- Summary: [Dev]Fix FORMAT_DIR in update-flatbuffers.sh Key: ARROW-7291 URL: https://issues.apache.org/jira/browse/ARROW-7291 Project: Apache Arrow Issue Type: Bug Components: Developer Tools Reporter: Kenta Murata Assignee: Kenta Murata -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7037) [C++ ] Compile error on the combination of protobuf >= 3.9 and clang
Kenta Murata created ARROW-7037: --- Summary: [C++ ] Compile error on the combination of protobuf >= 3.9 and clang Key: ARROW-7037 URL: https://issues.apache.org/jira/browse/ARROW-7037 Project: Apache Arrow Issue Type: Task Components: C++ Reporter: Kenta Murata Assignee: Kenta Murata I encountered the following compile error on the combination of protobuf 3.10.0 and clang (Xcode 11). {noformat} [13/26] Building CXX object c++/src/CMakeFiles/orc.dir/wrap/orc-proto-wrapper.cc.o FAILED: c++/src/CMakeFiles/orc.dir/wrap/orc-proto-wrapper.cc.o /Applications/Xcode_11.1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -Ic++/include -I/Users/mrkn/src/github.com/apache/arrow/cpp/build.debug/orc_ep-prefix/src/orc_ep/c++/include -I/Users/mrkn/src/github.com/apache/arrow/cpp/build.debug/orc_ep-prefix/src/orc_ep/c++/src -Ic++/src -isystem c++/libs/thirdparty/zlib_ep-install/include -isystem c++/libs/thirdparty/lz4_ep-install/include -Qunused-arguments -fcolor-diagnostics -ggdb -O0 -g -fPIC -Wno-zero-as-null-pointer-constant -Wno-inconsistent-missing-destructor-override -Wno-error=undef -std=c++11 -Weverything -Wno-c++98-compat -Wno-missing-prototypes -Wno-c++98-compat-pedantic -Wno-padded -Wno-covered-switch-default -Wno-missing-noreturn -Wno-unknown-pragmas -Wno-gnu-zero-variadic-macro-arguments -Wconversion -Wno-c++2a-compat -Werror -std=c++11 -Weverything -Wno-c++98-compat -Wno-missing-prototypes -Wno-c++98-compat-pedantic -Wno-padded -Wno-covered-switch-default -Wno-missing-noreturn -Wno-unknown-pragmas -Wno-gnu-zero-variadic-macro-arguments -Wconversion -Wno-c++2a-compat -Werror -O0 -g -MD -MT c++/src/CMakeFiles/orc.dir/wrap/orc-proto-wrapper.cc.o -MF c++/src/CMakeFiles/orc.dir/wrap/orc-proto-wrapper.cc.o.d -o c++/src/CMakeFiles/orc.dir/wrap/orc-proto-wrapper.cc.o -c /Users/mrkn/src/github.com/apache/arrow/cpp/build.debug/orc_ep-prefix/src/orc_ep/c++/src/wrap/orc-proto-wrapper.cc In file included from /Users/mrkn/src/github.com/apache/arrow/cpp/build.debug/orc_ep-prefix/src/orc_ep/c++/src/wrap/orc-proto-wrapper.cc:44: c++/src/orc_proto.pb.cc:959:145: error: possible misuse of comma operator here [-Werror,-Wcomma] static bool dynamic_init_dummy_orc_5fproto_2eproto = ( ::PROTOBUF_NAMESPACE_ID::internal::AddDescriptors(_table_orc_5fproto_2eproto), true); ^ c++/src/orc_proto.pb.cc:959:57: note: cast expression to void to silence warning static bool dynamic_init_dummy_orc_5fproto_2eproto = ( ::PROTOBUF_NAMESPACE_ID::internal::AddDescriptors(_table_orc_5fproto_2eproto), true); ^~~~ static_cast( ) 1 error generated. {noformat} This may be due to a bug of protobuf filed as https://github.com/protocolbuffers/protobuf/issues/6619. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7036) [C++] Version up ORC to avoid compile errors
Kenta Murata created ARROW-7036: --- Summary: [C++] Version up ORC to avoid compile errors Key: ARROW-7036 URL: https://issues.apache.org/jira/browse/ARROW-7036 Project: Apache Arrow Issue Type: Task Components: C++ Reporter: Kenta Murata Assignee: Kenta Murata I encountered the compile errors due to {{-Wshadow-field}} like below: {noformat} [1/4] Building CXX object c++/src/CMakeFiles/orc.dir/Vector.cc.o FAILED: c++/src/CMakeFiles/orc.dir/Vector.cc.o /Applications/Xcode_11.1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -Ic++/include -I/Users/mrkn/src/github.com/apache/arrow/cpp/build.debug/orc_ep-prefix/src/orc_ep/c++/include -I/Users/mrkn/src/github.com/apa che/arrow/cpp/build.debug/orc_ep-prefix/src/orc_ep/c++/src -Ic++/src -isystem c++/libs/thirdparty/zlib_ep-install/include -isystem c++/libs/thirdparty/lz4_ep-install/include -Qunused-arguments -fcolor-diagnostics -ggdb -O0 -g -fPIC -Wno-z ero-as-null-pointer-constant -Wno-inconsistent-missing-destructor-override -Wno-error=undef -std=c++11 -Weverything -Wno-c++98-compat -Wno-missing-prototypes -Wno-c++98-compat-pedantic -Wno-padded -Wno-covered-switch-default -Wno-missing-n oreturn -Wno-unknown-pragmas -Wno-gnu-zero-variadic-macro-arguments -Wconversion -Werror -std=c++11 -Weverything -Wno-c++98-compat -Wno-missing-prototypes -Wno-c++98-compat-pedantic -Wno-padded -Wno-covered-switch-default -Wno-missing-nore turn -Wno-unknown-pragmas -Wno-gnu-zero-variadic-macro-arguments -Wconversion -Werror -O0 -g -MD -MT c++/src/CMakeFiles/orc.dir/Vector.cc.o -MF c++/src/CMakeFiles/orc.dir/Vector.cc.o.d -o c++/src/CMakeFiles/orc.dir/Vector.cc.o -c /Users/mr kn/src/github.com/apache/arrow/cpp/build.debug/orc_ep-prefix/src/orc_ep/c++/src/Vector.cc /Users/mrkn/src/github.com/apache/arrow/cpp/build.debug/orc_ep-prefix/src/orc_ep/c++/src/Vector.cc:59:45: error: parameter 'capacity' shadows member inherited from type 'ColumnVectorBatch' [-Werror,-Wshadow-field] LongVectorBatch::LongVectorBatch(uint64_t capacity, MemoryPool& pool ^ /Users/mrkn/src/github.com/apache/arrow/cpp/build.debug/orc_ep-prefix/src/orc_ep/c++/include/orc/Vector.hh:46:14: note: declared here uint64_t capacity; ^ /Users/mrkn/src/github.com/apache/arrow/cpp/build.debug/orc_ep-prefix/src/orc_ep/c++/src/Vector.cc:87:49: error: parameter 'capacity' shadows member inherited from type 'ColumnVectorBatch' [-Werror,-Wshadow-field] DoubleVectorBatch::DoubleVectorBatch(uint64_t capacity, MemoryPool& pool ^ /Users/mrkn/src/github.com/apache/arrow/cpp/build.debug/orc_ep-prefix/src/orc_ep/c++/include/orc/Vector.hh:46:14: note: declared here uint64_t capacity; ^ /Users/mrkn/src/github.com/apache/arrow/cpp/build.debug/orc_ep-prefix/src/orc_ep/c++/src/Vector.cc:115:49: error: parameter 'capacity' shadows member inherited from type 'ColumnVectorBatch' [-Werror,-Wshadow-field] StringVectorBatch::StringVectorBatch(uint64_t capacity, MemoryPool& pool ^ /Users/mrkn/src/github.com/apache/arrow/cpp/build.debug/orc_ep-prefix/src/orc_ep/c++/include/orc/Vector.hh:46:14: note: declared here uint64_t capacity; ^ /Users/mrkn/src/github.com/apache/arrow/cpp/build.debug/orc_ep-prefix/src/orc_ep/c++/src/Vector.cc:407:55: error: parameter 'capacity' shadows member inherited from type 'ColumnVectorBatch' [-Werror,-Wshadow-field] TimestampVectorBatch::TimestampVectorBatch(uint64_t capacity, ^ /Users/mrkn/src/github.com/apache/arrow/cpp/build.debug/orc_ep-prefix/src/orc_ep/c++/include/orc/Vector.hh:46:14: note: declared here uint64_t capacity; ^ 4 errors generated. {noformat} Upgrading ORC to 1.5.7 will fix this errors. I used Xcode 11.1 on macOS Mojave. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6814) [C++] Resolve compiler warnings occurred on release build
Kenta Murata created ARROW-6814: --- Summary: [C++] Resolve compiler warnings occurred on release build Key: ARROW-6814 URL: https://issues.apache.org/jira/browse/ARROW-6814 Project: Apache Arrow Issue Type: Task Components: C++, C++ - Gandiva Reporter: Kenta Murata Assignee: Kenta Murata I encountered some compiler warnings on release build when I used gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1). [https://gist.github.com/mrkn/f7739edb301988a24e9d6066410b0625] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6508) [C++] Add Tensor and SparseTensor factory function with validations
Kenta Murata created ARROW-6508: --- Summary: [C++] Add Tensor and SparseTensor factory function with validations Key: ARROW-6508 URL: https://issues.apache.org/jira/browse/ARROW-6508 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Kenta Murata Now Tensor and SparseTensor only have their constructors, but not factory functions that validate the parameters. We need such factory functions for creating Tensor and SparseTensor from parameters given from the external source. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (ARROW-6505) [Website] Add new committers
Kenta Murata created ARROW-6505: --- Summary: [Website] Add new committers Key: ARROW-6505 URL: https://issues.apache.org/jira/browse/ARROW-6505 Project: Apache Arrow Issue Type: Improvement Components: Website Reporter: Kenta Murata Assignee: Kenta Murata I'd like to add new committers on the committer list. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (ARROW-6503) [C++] Add an argument of memory pool object to SparseTensorConverter
Kenta Murata created ARROW-6503: --- Summary: [C++] Add an argument of memory pool object to SparseTensorConverter Key: ARROW-6503 URL: https://issues.apache.org/jira/browse/ARROW-6503 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Kenta Murata Assignee: Kenta Murata According to the comment https://github.com/apache/arrow/pull/5290#discussion_r322244745, we need to have variants of some functions for supplying a memory pool object to SparseTensorConverter function. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (ARROW-6501) [Format][C++] Remove non_zero_length field from SparseIndex
Kenta Murata created ARROW-6501: --- Summary: [Format][C++] Remove non_zero_length field from SparseIndex Key: ARROW-6501 URL: https://issues.apache.org/jira/browse/ARROW-6501 Project: Apache Arrow Issue Type: Improvement Components: C++, Format Reporter: Kenta Murata Assignee: Kenta Murata We can remove non_zero_length field from SparseIndex because it can be supplied from the shape of the indices tensor. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (ARROW-6489) [Developer][Documentation]Fix merge script and readme
Kenta Murata created ARROW-6489: --- Summary: [Developer][Documentation]Fix merge script and readme Key: ARROW-6489 URL: https://issues.apache.org/jira/browse/ARROW-6489 Project: Apache Arrow Issue Type: Improvement Components: Developer Tools Reporter: Kenta Murata Assignee: Kenta Murata The following things should be fixed. - merge_arrow_pr.py shouldn't be affected by git's merge.ff value. - README should describe the information of APACHE_JIRA_USERNAME and APACHE_JIRA_PASSWORD - README should describe the users needs to install requests and jira libraries before running merge_arrow_pr.py -- This message was sent by Atlassian Jira (v8.3.2#803003)
Re: [DISCUSS][FORMAT] Concerning about character encoding of binary string data
Thanks for responding. I understand ExtensionType is suitable for handling character encoding. I'll try to make and propose draft specification and implementation of an extension type. Regards, Kenta Murata 2019年9月5日(木) 7:56 Wes McKinney : > > I opened https://issues.apache.org/jira/browse/ARROW-6455. It might > make sense to define a common ExtensionType metadata in case multiple > implementations decide they need this > > On Tue, Sep 3, 2019 at 10:35 PM Micah Kornfield wrote: > > > > This might be bike-shedding but I agree we should attempt to use extension > > types for this use-cases. I would expect something like: > > ARROW:extension:name=NonUtf8String > > ARROW:extension:metadata = "{\"iso-charset\": "ISO-8859-10"}" > > > > The latter's value being a json encoded string, which captures the > > character set. > > > > Thanks, > > Micah > > > > > > On Tue, Sep 3, 2019 at 6:59 PM Sutou Kouhei wrote: > > > > > Hi, > > > > > > > If people can constrain to use UTF-8 for all the string data, > > > > StringArray is enough for them. But if they cannot unify the character > > > > encoding of string data in UTF-8, should Apache Arrow provides the > > > > standard way of the character encoding management? > > > > > > I think that Apache Arrow users should convert their string > > > data to UTF-8 in their application. If Apache Arrow only > > > supports UTF-8 string, Apache Arrow users can process string > > > data without converting encoding between multiple systems. I > > > think no conversion (zero-copy) use is Apache Arrow way. > > > > > > > My opinion is that Apache Arrow must have the standard way in both its > > > > format and its API. The reason is below: > > > > > > > > (1) Currently, when we use MySQL or PostgreSQL as the data source of > > > > record batch streams, we will lose the information of character > > > > encodings the original data have > > > > > > Both MySQL and PostgreSQL provide encoding conversion > > > feature. So we can convert the original data to UTF-8. > > > > > > MySQL: > > > > > > CONVERT function > > > > > > https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html#function_convert > > > > > > PostgreSQL: > > > > > > convert_to function > > > > > > https://www.postgresql.org/docs/11/functions-string.html#id-1.5.8.9.7.2.2.8.1.1 > > > > > > > > > > > > If we need to support non UTF-8 encodings, I like > > > NonUTF8String or something extension type and metadata > > > approach. I prefer "ARROW:encoding" rather than > > > "ARROW:charset" for metadata key too. > > > > > > > > > Thanks, > > > -- > > > kou > > > > > > In > > > "[DISCUSS][FORMAT] Concerning about character encoding of binary string > > > data" on Mon, 2 Sep 2019 17:39:22 +0900, > > > Kenta Murata wrote: > > > > > > > [Abstract] > > > > When we have a string data encoded in a character encoding other than > > > > UTF-8, we must use a BinaryArray for the data. But Apache Arrow > > > > doesn’t provide the way to specify what a character encoding used in a > > > > BinaryArray. In this mail, I’d like to discuss how Apache Arrow > > > > provides the way to manage a character encoding in a BinaryArray. > > > > > > > > I’d appreciate any comments or suggestions. > > > > > > > > [Long description] > > > > Apache Arrow has the specialized type for UTF-8 encoded string but > > > > doesn’t have types for other character encodings, such as ISO-8859-x > > > > and Shift_JIS. We need to manage what a character encoding is used in > > > > a binary string array, in the outside of the arrays such as metadata. > > > > > > > > In Datasets project, one of the goals is to support database > > > > protocols. Some databases support a lot of character encodings in > > > > each manner. For example, PostgreSQL supports to specify what a > > > > character encoding is used for each database, and MySQL allows us to > > > > specify character encodings separately for each level: database, > > > > table, and column. > > > > > > > > I have a concern about how does Apache Arrow provide the way t
Re: [ANNOUNCE] New committers: Ben Kietzman, Kenta Murata, and Neal Richardson
Thank you very much everyone! I'm very happy to join this community. 2019年9月6日(金) 12:39 Micah Kornfield : > > Congrats everyone. > > On Thu, Sep 5, 2019 at 7:06 PM Ji Liu wrote: > > > Congratulations! > > > > Thanks, > > Ji Liu > > > > > > -- > > From:Fan Liya > > Send Time:2019年9月6日(星期五) 09:28 > > To:dev > > Subject:Re: [ANNOUNCE] New committers: Ben Kietzman, Kenta Murata, and > > Neal Richardson > > > > Big congratulations to Ben, Kenta and Neal! > > > > Best, > > Liya Fan > > > > On Fri, Sep 6, 2019 at 5:33 AM Wes McKinney wrote: > > > > > hi all, > > > > > > on behalf of the Arrow PMC, I'm pleased to announce that Ben, Kenta, > > > and Neal have accepted invitations to become Arrow committers. Welcome > > > and thank you for all your contributions! > > > > > -- Kenta Murata OpenPGP FP = 1D69 ADDE 081C 9CC2 2E54 98C1 CEFE 8AFB 6081 B062 本を書きました!! 『Ruby 逆引きレシピ』 http://www.amazon.co.jp/dp/4798119881/mrkn-22 E-mail: m...@mrkn.jp twitter: http://twitter.com/mrkn/ blog: http://d.hatena.ne.jp/mrkn/
[DISCUSS][FORMAT] Concerning about character encoding of binary string data
[Abstract] When we have a string data encoded in a character encoding other than UTF-8, we must use a BinaryArray for the data. But Apache Arrow doesn’t provide the way to specify what a character encoding used in a BinaryArray. In this mail, I’d like to discuss how Apache Arrow provides the way to manage a character encoding in a BinaryArray. I’d appreciate any comments or suggestions. [Long description] Apache Arrow has the specialized type for UTF-8 encoded string but doesn’t have types for other character encodings, such as ISO-8859-x and Shift_JIS. We need to manage what a character encoding is used in a binary string array, in the outside of the arrays such as metadata. In Datasets project, one of the goals is to support database protocols. Some databases support a lot of character encodings in each manner. For example, PostgreSQL supports to specify what a character encoding is used for each database, and MySQL allows us to specify character encodings separately for each level: database, table, and column. I have a concern about how does Apache Arrow provide the way to specify character encodings for values in arrays. If people can constrain to use UTF-8 for all the string data, StringArray is enough for them. But if they cannot unify the character encoding of string data in UTF-8, should Apache Arrow provides the standard way of the character encoding management? The example use of Apache Arrow in such case is an application to the internal data of OR mapper library, such as ActiveRecord of Ruby on Rails. My opinion is that Apache Arrow must have the standard way in both its format and its API. The reason is below: (1) Currently, when we use MySQL or PostgreSQL as the data source of record batch streams, we will lose the information of character encodings the original data have (2) We need to struggle to support character encoding treatment on each combination of systems if we don’t have a standard way of character encoding management, though this is not fit to Apache Arrow’s philosophy (3) We cannot support character encoding treatment in the level of language-binding if Apache Arrow doesn’t provide the standard APIs of character encoding management There are two options to manage a character encoding in a BinaryArray. The first way is introducing an optional character_encoding field in BinaryType. The second way is using custom_metadata field to supply the character encoding name. If we use custom_metadata, we should decide the key for this information. I guess “charset” is good candidates for the key because it is widely used for specifying what a character encoding is used. The value must be the name of a character encoding, such as “UTF-8” and “Windows-31J”. It is better if we can decide canonical encoding names, but I guess it is hard work because many systems use the same name for the different encodings. For example, “Shift_JIS” means either IANA’s Shift_JIS or Windows-31J, they use the same coding rule but the corresponding character sets are slightly different. See the spreadsheet [1] for the correspondence of character encoding names between MySQL, PostgreSQL, Ruby, Python, IANA [3], and Encoding standard of WhatWG [4]. If we introduce the new optional field for the information of a character encoding in BinaryType, I recommend let this new field be a string to keep the name of a character encoding. But it is possible to make the field integer and let it keep the enum value. I don’t know there is a good standard for the enum value of character encodings. IANA manages MIBenum [2], though the registered character encodings [3] are not enough for our requirement, I think. I prefer the second way because the first way can supply the information of character encoding only to a Field, not a BinaryArray. [1] https://docs.google.com/spreadsheets/d/1D0xlI5r2wJUV45aTY1q2TwqD__v7acmd8FOfr8xSOVQ/edit?usp=sharing [2] https://tools.ietf.org/html/rfc3808 [3] https://www.iana.org/assignments/character-sets/character-sets.xhtml [4] https://encoding.spec.whatwg.org/ -- Kenta Murata
Re: [DISCUSS][Format][C++] Improvement of sparse tensor format and implementation
2019年8月28日(水) 8:57 Rok Mihevc : > > On Wed, Aug 28, 2019 at 1:18 AM Wes McKinney wrote: > > > null/NA. But, as far as I'm aware, this component of pandas is > > relatively unique and was never intended as an alternatives to sparse > > matrix libraries. > > > > Another example is > https://sparse.pydata.org/en/latest/generated/sparse.SparseArray.html?highlight=fill%20value#sparse.SparseArray.fill_value, > but it might have been influenced by Pandas. pydata/sparse's COO tensor also has fill_value property, and it raises a ValueError in to_scipy_sparse method when the tensor has a non-zero fill value. So we should support fill value someday, I think. > I'm ok with dropping this for now. Yes, we can advance without it, and support it later. And, I think supporting fill value is not difficult. -- Kenta Murata
Re: [DISCUSS][Format][C++] Improvement of sparse tensor format and implementation
2019年8月28日(水) 6:05 Wes McKinney : > I'm also OK with these changes. Since we have not established a > versioning or compatibility policy with regards to "Other" data > structures like Tensor and SparseTensor, I don't know that a vote is > needed, just a pull request. I didn't understand that Tensor and SparseTensor isn't restricted by a versioning and compatibility policy. OK, I'll send some pull-requests. -- Kenta Murata
[jira] [Created] (ARROW-6393) [C++]Add EqualOptions support in SparseTensor::Equals
Kenta Murata created ARROW-6393: --- Summary: [C++]Add EqualOptions support in SparseTensor::Equals Key: ARROW-6393 URL: https://issues.apache.org/jira/browse/ARROW-6393 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Kenta Murata Assignee: Kenta Murata SparseTensor::Equals should take EqualOptions argument as Tensor::Equals does. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[DISCUSSION] Automatically adding a the URL of the corresponding JIRA ticket as a comment in GitHub pull-request
I frequently do the following little bit bothersome steps for opening JIRA tickets when I watch a GitHub pull-request: 1. Select the "ARROW-" text in the title and copy it 2. Open JIRA if I haven't open it 3. Select a ticket to open it 4. Alter the URL by pasting text that copied at the step-1 5. Hit the enter key I think it is better if these steps become easier. We already have a mechanism to inject a GitHub pull-request URL into the corresponding JIRA ticket. How about making the similar mechanism for the reverse link? I guess it is possible to automate making a comment of JIRA ticket URL to the pull-request when the "ARROW-" text is injected in the title field by using GitHub Actions. I consulted this idea to Kou, he said ursabot may be appropriate to implement such the feature. And he promote me to ask to Krisztian about this. Krisztian, what do you think this automation? Regards, Kenta Murata
[jira] [Created] (ARROW-6319) [C++] Extract the core of NumericTensor::Value as Tensor::Value
Kenta Murata created ARROW-6319: --- Summary: [C++] Extract the core of NumericTensor::Value as Tensor::Value Key: ARROW-6319 URL: https://issues.apache.org/jira/browse/ARROW-6319 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Kenta Murata Assignee: Kenta Murata I'd like to enable element-wise access in Tensor class. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[DISCUSS][Format][C++] Improvement of sparse tensor format and implementation
Hi, I’d like to propose the following improvement of the sparse tensor format and implementation. (1) To make variable bit-width indices available. The main purpose of the first part of the proposal is making 32-bit indices available. It allows us to serialize scipy.sparse.csr_matrix objects etc. with 32-bit indices without converting the index arrays to 64-bit values. As Jed said in the previous discussion [1] in this ML, since 32-bit indices have advantages of the small memory footprints, I strongly consider this change is necessary for the sparse tensor support for Apache Arrow. Adding both the type field in each sparse index format and the stride field in SparseCOOIndex format is necessary to do this. (2) Adding the new COO format with separated row and column indices Scipy.sparse.coo_matrix manages the indices of row and column in separated numpy arrays. It is enough for representing a sparse matrix. On the other hand, for supporting sparse tensors with arbitrary ranks, Arrow's SparseCOOIndex manages COO indices as one matrix. Hence we need to make a copy of indices to convert scipy.sparse.coo_matrix to Arrow’s SparseTensor. Introducing the new COO format with separated row and column indices can resolve this issue. (3) Adding SparseCSCIndex The CSC format of sparse matrices has the advantage of faster scanning in columnar direction while the CSR format is faster in a row-wise scan. Because The aptitude of CSC is different from the one of CSR, I want to support CSC before releasing Arrow 1.0. There are work-in-progress branch [2] of (1) above. I’d appreciate any comments or suggestions. [1] http://mail-archives.apache.org/mod_mbox/arrow-dev/201903.mbox/%3c87pnqz70rg@jedbrown.org%3e [2] https://github.com/mrkn/arrow/tree/sparse_tensor_index_value_type Regards, Kenta Murata
[jira] [Created] (ARROW-5830) [C++] Stop using memcmp in TensorEquals
Kenta Murata created ARROW-5830: --- Summary: [C++] Stop using memcmp in TensorEquals Key: ARROW-5830 URL: https://issues.apache.org/jira/browse/ARROW-5830 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Kenta Murata Because memcmp problematic for comparing floating-point values, such as NaNs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [VOTE] Release Apache Arrow 0.14.0 - RC0
I tried on Ubuntu Bionic, and got the build errors on grpc_ep (version 1.20.0). The error log will be shown the last of this mail. The errors are (1) absence of php_generator.h header file, and (2) the absence of has_ruby_package function in google::protobuf::FileOptions class. PHP support was introduced at the version 3.3.0. The has_ruby_package function was added at the version 3.6.0. So the minimum version of protobuf should be 3.6.0 for building grpc version 1.20.0. The error log I got is below: /tmp/arrow-0.14.0.dJDu3/apache-arrow-0.14.0/cpp/build/grpc_ep-prefix/src/grpc_ep/src/compiler/php_generator.cc:21:10: fatal error: google/protobuf/compiler/php/php_generator.h: No such file or directory #include ^~ compilation terminated. make[5]: *** [CMakeFiles/grpc_plugin_support.dir/src/compiler/php_generator.cc.o] Error 1 make[5]: *** Waiting for unfinished jobs /tmp/arrow-0.14.0.dJDu3/apache-arrow-0.14.0/cpp/build/grpc_ep-prefix/src/grpc_ep/src/compiler/ruby_generator.cc: In function ‘grpc::string grpc_ruby_generator::GetServices(const FileDescriptor*)’: /tmp/arrow-0.14.0.dJDu3/apache-arrow-0.14.0/cpp/build/grpc_ep-prefix/src/grpc_ep/src/compiler/ruby_generator.cc:165:25: error: ‘const class google::protobuf::FileOptions’ has no member named ‘has_ruby_package’; did you mean ‘has_java_package’? if (file->options().has_ruby_package()) { ^~~~ has_java_package /tmp/arrow-0.14.0.dJDu3/apache-arrow-0.14.0/cpp/build/grpc_ep-prefix/src/grpc_ep/src/compiler/ruby_generator.cc:166:38: error: ‘const class google::protobuf::FileOptions’ has no member named ‘ruby_package’; did you mean ‘java_package’? package_name = file->options().ruby_package(); ^~~~ java_package make[5]: *** [CMakeFiles/grpc_plugin_support.dir/src/compiler/ruby_generator.cc.o] Error 1 make[4]: *** [CMakeFiles/grpc_plugin_support.dir/all] Error 2 make[4]: *** Waiting for unfinished jobs make[3]: *** [all] Error 2 Regards, Kenta Murata 2019年7月3日(水) 3:00 Yosuke Shiro : > > Ran dev/release/verify-release-candidate.sh source 0.14.0 0 on macOS Mojave. > I got the following error, but it may be specific to my environment. > > """ > [ERROR] Failed to execute goal > pl.project13.maven:git-commit-id-plugin:2.2.2:revision (for-jars) on project > arrow-java-root: Could not complete Mojo execution...: Error: Could not get > HEAD Ref, are you sure you have set the dotGitDirectory property of this > plugin to a valid path? -> [Help 1] > [ERROR] > [ERROR] To see the full stack trace of the errors, re-run Maven with the -e > switch. > [ERROR] Re-run Maven using the -X switch to enable full debug logging. > [ERROR] > [ERROR] For more information about the errors and possible solutions, please > read the following articles: > [ERROR] [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException > + cleanup > + '[' no = yes ']' > + echo 'Failed to verify release candidate. See > /var/folders/8t/lw8gghw13hscdt7rr9j8kqnmgn/T/arrow-0.14.0.X.1bxYZ8hL > for details.' > Failed to verify release candidate. See > /var/folders/8t/lw8gghw13hscdt7rr9j8kqnmgn/T/arrow-0.14.0.X.1bxYZ8hL > for details. > “"" > > """ > $ java --version > openjdk 11.0.2 2019-01-15 > OpenJDK Runtime Environment 18.9 (build 11.0.2+9) > OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode) > “"" > > """ > $ mvn --version > Apache Maven 3.6.0 (97c98ec64a1fdfee7767ce5ffb20918da4f719f3; > 2018-10-25T03:41:47+09:00) > Maven home: /usr/local/Cellar/maven/3.6.0/libexec > Java version: 11.0.2, vendor: Oracle Corporation, runtime: > /Library/Java/JavaVirtualMachines/jdk-11.0.2.jdk/Contents/Home > Default locale: en_JP, platform encoding: UTF-8 > OS name: "mac os x", version: "10.14.4", arch: "x86_64", family: "mac" > “"" > > C# verification ran fine to me. > > > > On Jul 2, 2019, at 18:57, Antoine Pitrou wrote: > > > > > > +1 for this RC0 anyway (binding). > > > > > > Le 02/07/2019 à 11:36, Antoine Pitrou a écrit : > >> > >> I tried again (Ubuntu 18.04): > >> > >> * binaries verification succeeded > >> > >> * source verification failed in gRPC configure step: > >> > >> CMake Error at cmake/cares.cmake:38 (find_package): > >> Could not find a package configuration file provided by "c-ares" with any > >> of the following names: > >> > >>c-aresConfig.cmak
[jira] [Created] (ARROW-5813) [C++] Support checking the equality of the different contiguous tensors
Kenta Murata created ARROW-5813: --- Summary: [C++] Support checking the equality of the different contiguous tensors Key: ARROW-5813 URL: https://issues.apache.org/jira/browse/ARROW-5813 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Kenta Murata Assignee: Kenta Murata The current TensorEquals function couldn't check the equality of the different contiguous tensors. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5754) [C++]Missing override for ~GrpcStreamWriter?
Kenta Murata created ARROW-5754: --- Summary: [C++]Missing override for ~GrpcStreamWriter? Key: ARROW-5754 URL: https://issues.apache.org/jira/browse/ARROW-5754 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Kenta Murata I encountered the following compile error: {{../src/arrow/flight/client.cc:244:3: error: '~GrpcStreamWriter' overrides a destructor but is not marked 'override' [-Werror,-Winconsistent-missing-destructor-override] ~GrpcStreamWriter() = default; ^ ../src/arrow/flight/client.h:86:27: note: overridden virtual function is here class ARROW_FLIGHT_EXPORT FlightStreamWriter : public ipc::RecordBatchWriter { ^}} Putting override modifier can resolve this problem. I'll make a pull-request for the change. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5736) [Format] Support small bit-width indices of sparse tensor
Kenta Murata created ARROW-5736: --- Summary: [Format] Support small bit-width indices of sparse tensor Key: ARROW-5736 URL: https://issues.apache.org/jira/browse/ARROW-5736 Project: Apache Arrow Issue Type: Improvement Components: Format Reporter: Kenta Murata Assignee: Kenta Murata Adding 32bit sparse index support is necessary to support non-copy data sharing with the existing systems such as SciPy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5704) [C++] Stop using ARROW_TEMPLATE_EXPORT for SparseTensorImpl class
Kenta Murata created ARROW-5704: --- Summary: [C++] Stop using ARROW_TEMPLATE_EXPORT for SparseTensorImpl class Key: ARROW-5704 URL: https://issues.apache.org/jira/browse/ARROW-5704 Project: Apache Arrow Issue Type: Task Components: C++ Reporter: Kenta Murata Assignee: Kenta Murata I'd like to stop using ARROW_TEMPLATE_EXPORT for SparseTensorImpl class so that it can be wrapped in Arrow GLib library on the mingw platform. This relates to ARROW-4399. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5486) [GLib] Add binding of gandiva::FunctionRegistry and related things
Kenta Murata created ARROW-5486: --- Summary: [GLib] Add binding of gandiva::FunctionRegistry and related things Key: ARROW-5486 URL: https://issues.apache.org/jira/browse/ARROW-5486 Project: Apache Arrow Issue Type: New Feature Components: GLib Reporter: Kenta Murata Assignee: Kenta Murata I'd like to add a support of gandiva::FunctionRegistry and the related things in gandiva-glib. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5320) [C++] Undefined symbol errors are occurred when linking parquet executables
Kenta Murata created ARROW-5320: --- Summary: [C++] Undefined symbol errors are occurred when linking parquet executables Key: ARROW-5320 URL: https://issues.apache.org/jira/browse/ARROW-5320 Project: Apache Arrow Issue Type: Bug Components: C++ Environment: Xcode 10.2 on macOS Mojave 10.14.4 Reporter: Kenta Murata Undefined symbol errors occurred when linking debug/parquet-reader, debug/parquet-file-deserialize-test, and debug/parquet-scan. The unresolvable symbol is of boost regex referred in libparquet.a. I tried to build the commit 608e846a9f825a30a0faa651bc0a3eebba20e7db with Xcode 10.2 on macOS Mojave. I specified -DARROW_BOOST_VENDORED=ON to avoid the problem related to the latest boost in Homebrew (See [https://github.com/boostorg/process/issues/55]). The complete build log is available here: [https://gist.github.com/mrkn/e5489140c9a782ca13a1b4bb8dd33111] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5155) [GLib][Ruby] MakeDense and MakeSparse in UnionArray should accept a vector of Field
Kenta Murata created ARROW-5155: --- Summary: [GLib][Ruby] MakeDense and MakeSparse in UnionArray should accept a vector of Field Key: ARROW-5155 URL: https://issues.apache.org/jira/browse/ARROW-5155 Project: Apache Arrow Issue Type: New Feature Components: GLib, Ruby Reporter: Kenta Murata Assignee: Kenta Murata This is a derivative issue of https://issues.apache.org/jira/browse/ARROW-4622 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5150) [Ruby] Add Arrow::Table#raw_records
Kenta Murata created ARROW-5150: --- Summary: [Ruby] Add Arrow::Table#raw_records Key: ARROW-5150 URL: https://issues.apache.org/jira/browse/ARROW-5150 Project: Apache Arrow Issue Type: New Feature Components: Ruby Reporter: Kenta Murata Assignee: Kenta Murata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5050) [C++] cares_ep should build before grpc_ep
Kenta Murata created ARROW-5050: --- Summary: [C++] cares_ep should build before grpc_ep Key: ARROW-5050 URL: https://issues.apache.org/jira/browse/ARROW-5050 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Kenta Murata Assignee: Kenta Murata I found that grpc_ep can fail to find cares_ep because grpc_ep may be built before cares_ep. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5032) [C++] Headers in vendored/datetime directory aren't installed
Kenta Murata created ARROW-5032: --- Summary: [C++] Headers in vendored/datetime directory aren't installed Key: ARROW-5032 URL: https://issues.apache.org/jira/browse/ARROW-5032 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Kenta Murata I found that header files in vendored/datetime directory are not installed even though vendored/datetime.h is installed. vendored/datetime.h is depends on the files in vendored/datetime directory, so they should be installed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4942) [Ruby] Remove needless omits
Kenta Murata created ARROW-4942: --- Summary: [Ruby] Remove needless omits Key: ARROW-4942 URL: https://issues.apache.org/jira/browse/ARROW-4942 Project: Apache Arrow Issue Type: Test Components: Ruby Reporter: Kenta Murata Assignee: Kenta Murata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4938) [Glib] Undefined symbols error occurred when GIR file is being generated.
Kenta Murata created ARROW-4938: --- Summary: [Glib] Undefined symbols error occurred when GIR file is being generated. Key: ARROW-4938 URL: https://issues.apache.org/jira/browse/ARROW-4938 Project: Apache Arrow Issue Type: Bug Components: GLib Reporter: Kenta Murata When there are the old arrow-glib.*dylib files in the installation directory, and these libraries doesn't have enough symbols, the "undefined symbols" error is occurred during GIR file is generated. When I encountered this error, removing the old libraries resolves the problem. I extracted the build log related to this problem in this gist: https://gist.github.com/mrkn/6c14d5cae2bebca4609ed9c3ef8e5bbf -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4932) [GLib] Use G_DECLARE_DERIVABLE_TYPE macro
Kenta Murata created ARROW-4932: --- Summary: [GLib] Use G_DECLARE_DERIVABLE_TYPE macro Key: ARROW-4932 URL: https://issues.apache.org/jira/browse/ARROW-4932 Project: Apache Arrow Issue Type: Task Components: GLib Reporter: Kenta Murata Assignee: Kenta Murata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4906) [Format] Fix document to describe that SparseMatrixIndexCSR assumes indptr is sorted for each row
Kenta Murata created ARROW-4906: --- Summary: [Format] Fix document to describe that SparseMatrixIndexCSR assumes indptr is sorted for each row Key: ARROW-4906 URL: https://issues.apache.org/jira/browse/ARROW-4906 Project: Apache Arrow Issue Type: Bug Reporter: Kenta Murata Assignee: Kenta Murata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Sparse matrix formats
Hi Jed, I'd like to describe the current status of the implementation of SparseTensor. I hope the following explanation will help you. First of all, I designed the current SparseTensor format as the first interim implementation. At this time I used scipy.sparse as a reference. The reason why I started with the two formats, COO and CSR, is that I thought they were commonly used and appropriate as the first implementation of SparseTensor. For 1: The current SparseTensor uses int64_t for its indices value type. If the "long" you mentioned is one in SparseTensor.fbs, that "long" is of Flatbuffers' type, which is 64-bit signed integer. I hope we can improve the current SparseTensor so that users can specify whether to use int32_t or int64_t as the index data type. Then we can use int64_t to handle larger matrices, as well as to improve performance with 32-bit indices. Moreover, we can share not-too-large canonicalized scipy's sparse matrices without conversion. For 2: There are two reasons why the current COO format is limited to be sorted: a) I wanted to start with a simple implementation b) I thought that Arrow is often used to exchange sparse matrices that are canonicalized for computation I think it would be better for Arrow to support unsorted indexes, and I think it would be even better to be able to handle sparse matrices with duplicated values. Since scipy.sparse can handle such data, I think that supporting non-canonicalized formats are necessary for adding scipy integration. I forgot to denote the CSR index suppose to be sorted. I'll issue PR for fixing it. If SparseTensor supports unsorted indexes or duplicated values, it would be nice to add the ability to canonicalize them as well. I am busy with preparations for RubyKaigi in mid-April now, but after RubyKaigi, I will return to the improvement of sparse matrix implementation again. Before that, I would like to support as much as I can if someone is working on it. For 3: Since I have only used sparse matrices to create and process bug of words matrices in natural language processing, I have no experience of partitioning sparse matrices for distributed computing. I think it's great that Arrow can handle such a use case, so I would like to cooperate if there is something I can do for the realization. Thanks, Kenta Murata 2019年3月12日(火) 2:17 Jed Brown : > > Thanks. I'm new to the Arrow community so was hoping to get feedback if > any of these are controversial or subject to constraints that I'm likely > not familiar with. Point 2 is likely simplest and I can start with > that. > > Point 3 isn't coherent as a PR concept, but is a potential audience > whose relation to Arrow I don't understand (could be an explicit > non-goal for all I know). > > Wes McKinney writes: > > > hi Jed, > > > > Would you like to submit a pull request to propose the changes or > > additions you are escribing? > > > > Thanks > > Wes > > > > On Sat, Mar 9, 2019 at 11:32 PM Jed Brown wrote: > >> > >> Wes asked me to bring this discussion here. I'm a developer of PETSc > >> and, with Arrow is getting into the sparse representation space, would > >> like for it to interoperate as well as possible. > >> > >> 1. Please guarantee support for 64-bit offsets and indices. The current > >> spec uses "long", which is 32-bit on some common platforms (notably > >> Windows). Specifying int64_t would bring that size support to LLP64 > >> architectures. > >> > >> Meanwhile, there is a significant performance benefit to using 32-bit > >> indices when sizes allow it. If using 4-byte floats, a CSR format costs > >> 12 bytes per nonzero with int64, versus 8 bytes per nonzero with int32. > >> Sparse matrix operations are typically dominated by bandwidth, so this > >> is a substantial performance impact. > >> > >> 2. This relates to sorting of indices for CSR. Many algorithms need to > >> operate on upper vs lower-triangular parts of matrices, which is much > >> easier if the CSR column indices are sorted within rows. Typically, one > >> finds the diagonal (storing its location in each row, if it exists). > >> Given that the current spec says COO entries are sorted, it would be > >> simple to specify this also for CSR. > >> > >> https://github.com/apache/arrow/blob/master/format/SparseTensor.fbs > >> > >> > >> 3. This is more nebulous and relates to partitioned representations of > >> matrices, such as arise when using distributed memory or when optimizing > >> for locality when using threads. Some packages store "global" indices > >> in the CSR representation (in which case yo
[jira] [Created] (ARROW-4775) [Website] Site navbar cannot be expanded
Kenta Murata created ARROW-4775: --- Summary: [Website] Site navbar cannot be expanded Key: ARROW-4775 URL: https://issues.apache.org/jira/browse/ARROW-4775 Project: Apache Arrow Issue Type: Bug Components: Website Reporter: Kenta Murata Assignee: Kenta Murata I found that the navbar at the top of the page cannot be expanded when the page is narrow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4671) [C++] MakBuilder must care Type::DICTIONARY
Kenta Murata created ARROW-4671: --- Summary: [C++] MakBuilder must care Type::DICTIONARY Key: ARROW-4671 URL: https://issues.apache.org/jira/browse/ARROW-4671 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Kenta Murata Now, we cannot create a builder for DictionaryArray by using MakeBuilder. When we pass DictionaryType to MakeBuilder, it says like below: {quote}MakeBuilder: cannot construct builder for type dictionary {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4662) [Python] Add type_codes property in UnionType
Kenta Murata created ARROW-4662: --- Summary: [Python] Add type_codes property in UnionType Key: ARROW-4662 URL: https://issues.apache.org/jira/browse/ARROW-4662 Project: Apache Arrow Issue Type: New Feature Components: Python Reporter: Kenta Murata Assignee: Kenta Murata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4632) [Ruby] Add BigDecimal#to_arrow
Kenta Murata created ARROW-4632: --- Summary: [Ruby] Add BigDecimal#to_arrow Key: ARROW-4632 URL: https://issues.apache.org/jira/browse/ARROW-4632 Project: Apache Arrow Issue Type: New Feature Components: Ruby Reporter: Kenta Murata Assignee: Kenta Murata It may be better that BigDecimal has to_arrow instance method to convert itself to Arrow::Decimal128. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4622) MakeDense and MakeSparse in UnionArray should accept a vector of Field
Kenta Murata created ARROW-4622: --- Summary: MakeDense and MakeSparse in UnionArray should accept a vector of Field Key: ARROW-4622 URL: https://issues.apache.org/jira/browse/ARROW-4622 Project: Apache Arrow Issue Type: Bug Components: C++, GLib, Python Reporter: Kenta Murata Assignee: Kenta Murata Currently MakeDense and MakeUnion of UnionArray couldn't create a UnionArray with user-specified field names. This is bugs of these functions. To fix them, optional arguments of std::vector should be added. GLib and Python bindings should be fixed, together. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4600) [Ruby] Arrow::DictionaryArray#[] should returns the item in the indices array
Kenta Murata created ARROW-4600: --- Summary: [Ruby] Arrow::DictionaryArray#[] should returns the item in the indices array Key: ARROW-4600 URL: https://issues.apache.org/jira/browse/ARROW-4600 Project: Apache Arrow Issue Type: Bug Components: Ruby Reporter: Kenta Murata Arrow::DictionaryArray#[] should returns the item in the indices array. However, the current behavior is error like below: {{Traceback (most recent call last):}} {{ 5: from test.rb:4:in `'}} {{ 4: from test.rb:4:in `new'}} {{ 3: from /Users/mrkn/src/github.com/apache/arrow/ruby/red-arrow/lib/arrow/dictionary-data-type.rb:103:in `initialize'}} {{ 2: from /Users/mrkn/.rbenv/versions/2.6.0/lib/ruby/gems/2.6.0/gems/gobject-introspection-3.3.1/lib/gobject-introspection/loader.rb:328:in `block in load_constructor_infos'}} {{ 1: from /Users/mrkn/.rbenv/versions/2.6.0/lib/ruby/gems/2.6.0/gems/gobject-introspection-3.3.1/lib/gobject-introspection/loader.rb:317:in `block (2 levels) in load_constructor_infos'}} {{/Users/mrkn/.rbenv/versions/2.6.0/lib/ruby/gems/2.6.0/gems/gobject-introspection-3.3.1/lib/gobject-introspection/loader.rb:317:in `invoke': *invalid argument Array (expect #) (+ArgumentError+)*}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4537) [CI] Suppress shell warning on travis-ci
Kenta Murata created ARROW-4537: --- Summary: [CI] Suppress shell warning on travis-ci Key: ARROW-4537 URL: https://issues.apache.org/jira/browse/ARROW-4537 Project: Apache Arrow Issue Type: Task Components: Continuous Integration Reporter: Kenta Murata Suppress shell warnings like: {{+'[' == 1 ']'}} {{/home/travis/build/apache/arrow/ci/travis_before_script_cpp.sh: line 81: [: ==: unary operator expected}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4536) Add data_type argument in garrow_list_array_new
Kenta Murata created ARROW-4536: --- Summary: Add data_type argument in garrow_list_array_new Key: ARROW-4536 URL: https://issues.apache.org/jira/browse/ARROW-4536 Project: Apache Arrow Issue Type: Bug Components: GLib Reporter: Kenta Murata This issue is corresponding to GitHub's https://github.com/apache/arrow/pull/3621 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4535) [C++] Fix MakeBuilder to preserve ListType's field name
Kenta Murata created ARROW-4535: --- Summary: [C++] Fix MakeBuilder to preserve ListType's field name Key: ARROW-4535 URL: https://issues.apache.org/jira/browse/ARROW-4535 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Kenta Murata MakeBuilder doesn't preserve the field name in the given ListType. I think this is a bug. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4506) [Ruby] Add Arrow::RecordBatch#raw_records
Kenta Murata created ARROW-4506: --- Summary: [Ruby] Add Arrow::RecordBatch#raw_records Key: ARROW-4506 URL: https://issues.apache.org/jira/browse/ARROW-4506 Project: Apache Arrow Issue Type: New Feature Components: Ruby Reporter: Kenta Murata Assignee: Kenta Murata I want to add Arrow::RecordBatch#raw_records method to convert a record batch object to a nested array. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4397) [C++] dim_names in Tensor and SparseTensor
Kenta Murata created ARROW-4397: --- Summary: [C++] dim_names in Tensor and SparseTensor Key: ARROW-4397 URL: https://issues.apache.org/jira/browse/ARROW-4397 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Kenta Murata Along with ARROW-4388, it would be useful to introduce dim_names in Tensor and SparseTensor of C++ library. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4320) [C++] Add tests for non-contiguous tensors
Kenta Murata created ARROW-4320: --- Summary: [C++] Add tests for non-contiguous tensors Key: ARROW-4320 URL: https://issues.apache.org/jira/browse/ARROW-4320 Project: Apache Arrow Issue Type: Test Reporter: Kenta Murata Assignee: Kenta Murata I would like to add some test cases for tensors with non-contiguous strides. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4318) [C++] Add Tensor::CountNonZero
Kenta Murata created ARROW-4318: --- Summary: [C++] Add Tensor::CountNonZero Key: ARROW-4318 URL: https://issues.apache.org/jira/browse/ARROW-4318 Project: Apache Arrow Issue Type: New Feature Reporter: Kenta Murata Assignee: Kenta Murata I would like to move CountNonZero defined in SparseTensorConverter into Tensor class, and add tests for this function. The pull-request is [https://github.com/apache/arrow/pull/3452]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4226) [C++] Add CSF sparse tensor support
Kenta Murata created ARROW-4226: --- Summary: [C++] Add CSF sparse tensor support Key: ARROW-4226 URL: https://issues.apache.org/jira/browse/ARROW-4226 Project: Apache Arrow Issue Type: New Feature Reporter: Kenta Murata [https://github.com/apache/arrow/pull/2546#pullrequestreview-156064172] {quote}Perhaps in the future, if zero-copy and future-proof-ness is really what we want, we might want to add the CSF (compressed sparse fiber) format, a generalisation of CSR/CSC. I'm currently working on adding it to PyData/Sparse, and I plan to make it the preferred format (COO will still be around though). {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4225) [C++] Add CSC sparse matrix support
Kenta Murata created ARROW-4225: --- Summary: [C++] Add CSC sparse matrix support Key: ARROW-4225 URL: https://issues.apache.org/jira/browse/ARROW-4225 Project: Apache Arrow Issue Type: New Feature Reporter: Kenta Murata CSC sparse matrix is necessary for integration with existing sparse matrix libraries (umfpack, superlu). https://github.com/apache/arrow/pull/2546#issuecomment-422135645 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4224) [Python] Support integration with pydata/sparse library
Kenta Murata created ARROW-4224: --- Summary: [Python] Support integration with pydata/sparse library Key: ARROW-4224 URL: https://issues.apache.org/jira/browse/ARROW-4224 Project: Apache Arrow Issue Type: New Feature Reporter: Kenta Murata It would be great to support integration with pydata/sparse library. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4223) [Python] Support scipy.sparse integration
Kenta Murata created ARROW-4223: --- Summary: [Python] Support scipy.sparse integration Key: ARROW-4223 URL: https://issues.apache.org/jira/browse/ARROW-4223 Project: Apache Arrow Issue Type: Improvement Reporter: Kenta Murata It would be great to support integration with scipy.sparse. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4221) [Format] Add canonical flag in COO sparse index
Kenta Murata created ARROW-4221: --- Summary: [Format] Add canonical flag in COO sparse index Key: ARROW-4221 URL: https://issues.apache.org/jira/browse/ARROW-4221 Project: Apache Arrow Issue Type: Improvement Reporter: Kenta Murata To support the integration with scipy.sparse.coo_matrix, it is necessary to add a flag in SparseCOOIndex. This flag denotes whether elements in COO sparse tensor is sorted lexicographically or not. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-4222) [C++] Support equality comparison between COO and CSR sparse tensors in SparseTensorEquals
Kenta Murata created ARROW-4222: --- Summary: [C++] Support equality comparison between COO and CSR sparse tensors in SparseTensorEquals Key: ARROW-4222 URL: https://issues.apache.org/jira/browse/ARROW-4222 Project: Apache Arrow Issue Type: Improvement Reporter: Kenta Murata Currently SparseTensorEquals always returns false when it gets COO and CSR sparse tensors. It should support comparing the items in the sparse tensors. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3518) Detect HOMEBREW_PREFIX automatically
Kenta Murata created ARROW-3518: --- Summary: Detect HOMEBREW_PREFIX automatically Key: ARROW-3518 URL: https://issues.apache.org/jira/browse/ARROW-3518 Project: Apache Arrow Issue Type: Improvement Reporter: Kenta Murata It can be detected by executing {{brew --prefix}} if available. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3515) Introduce NumericTensor class
Kenta Murata created ARROW-3515: --- Summary: Introduce NumericTensor class Key: ARROW-3515 URL: https://issues.apache.org/jira/browse/ARROW-3515 Project: Apache Arrow Issue Type: New Feature Reporter: Kenta Murata [https://github.com/apache/arrow/pull/2759] This commit defines the new NumericTensor class as a subclass of Tensor class. NumericTensor extends Tensor class by adding a member function to access element values in a tensor. I want to use this new feature for writing tests of SparseTensor in [#2546|https://github.com/apache/arrow/pull/2546]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [DISCUSS] Concerns about the Arrow Slack channel
Hi everyone, I heard from Kou that you’re discussing to stop using Slack. So I want to propose another way to use Discourse. On 2018/06/21 18:46:54, Dhruv Madeka wrote: > The issue with discourse is that you either have to host it or pay for them > to host it Discourse provides free hosting plan for community friendly opensource projects. See this article for the details: <https://blog.discourse.org/2016/03/free-discourse-forum-hosting-for-community-friendly-github-projects/> > but still +1 for discourse, its a really nice format (I actually +1'ed the > PyTorch forum on this thread too) I’m also +1 for discourse because I’m managing https://discourse.ruby-data.org/ by this plan. Regards, Kenta Murata