[jira] [Commented] (ARROW-6485) [Format][C++]Support the format of a COO sparse matrix that has separated row and column indices

2021-12-15 Thread Kenta Murata (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459900#comment-17459900
 ] 

Kenta Murata commented on ARROW-6485:
-

[~apitrou] I’m still interested in this feature but I suspect whether people 
want Apache Arrow to have this feature because nobody have never mentioned this 
feature.

> [Format][C++]Support the format of a COO sparse matrix that has separated row 
> and column indices
> 
>
> Key: ARROW-6485
> URL: https://issues.apache.org/jira/browse/ARROW-6485
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Format
>Reporter: Kenta Murata
>Assignee: Kenta Murata
>Priority: Major
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> For supporting non-copy interchanging of scipy.sparse.coo_matrix, I'd like to 
> add the new format of a COO matrix that has separated row and column indices.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-14518) [Ruby] ArrayBuilder doesn't work correctly with Decimal

2021-10-31 Thread Kenta Murata (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-14518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17436613#comment-17436613
 ] 

Kenta Murata commented on ARROW-14518:
--

{quote}Could you add BigDecimal#scale?{quote}

Yes, I'll add it. But the new property will be available only with the latest 
{{bigdecimal}} gem.
To support older versions of {{bigdecimal}} gems, we need to detect the 
existence of {{BigDecimal#scale}} and emulate it when it is absent.

> [Ruby] ArrayBuilder doesn't work correctly with Decimal
> ---
>
> Key: ARROW-14518
> URL: https://issues.apache.org/jira/browse/ARROW-14518
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Ruby
>Reporter: Kanstantsin Ilchanka
>Priority: Minor
>
> When trying to convert raw data with decimal values to Arrow::Table error 
> received
>  
> {code:java}
> Arrow::Table.new(x: [BigDecimal('1.1')])
> ArgumentError: wrong arguments: Arrow::Decimal128ArrayBuilder#initialize(): 
> available signatures: (data_type: 
> interface(Arrow::Decimal128DataType(GArrowDecimal128DataType)))
> {code}
> I guess this is because Decimal128ArrayBuilder expects Decimal128DataType in 
> initialiser, however I'm not sure how to correctly and effectively detect 
> precision and scale from array of BigDecimal
>  
> {code:java}
> Arrow::VERSION
> => "5.0.0"{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-14518) [Ruby] ArrayBuilder doesn't work correctly with Decimal

2021-10-31 Thread Kenta Murata (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-14518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17436611#comment-17436611
 ] 

Kenta Murata commented on ARROW-14518:
--

Arrow's decimal number is completely different from {{BigDecimal}}.  The former 
is a fixed-point number whereas the latter is a floating-point number.

The best approach is introducing the new fixed-point number system for Arrow's 
decimal numbers on the Ruby side.

The second-best approach for me is letting {{Arrow::Decimal128ArrayBuilder}} 
support arbitrary precisions and scales. That can be done by pooling 
{{BigDecimal}} values.

I's OK to add the new property in {{BigDecimal}} to obtain the number of digits 
following the decimal dot for assisting the latter case. Maybe, the suitable 
name of this property is {{BigDecimal#scale}}.

> [Ruby] ArrayBuilder doesn't work correctly with Decimal
> ---
>
> Key: ARROW-14518
> URL: https://issues.apache.org/jira/browse/ARROW-14518
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Ruby
>Reporter: Kanstantsin Ilchanka
>Priority: Minor
>
> When trying to convert raw data with decimal values to Arrow::Table error 
> received
>  
> {code:java}
> Arrow::Table.new(x: [BigDecimal('1.1')])
> ArgumentError: wrong arguments: Arrow::Decimal128ArrayBuilder#initialize(): 
> available signatures: (data_type: 
> interface(Arrow::Decimal128DataType(GArrowDecimal128DataType)))
> {code}
> I guess this is because Decimal128ArrayBuilder expects Decimal128DataType in 
> initialiser, however I'm not sure how to correctly and effectively detect 
> precision and scale from array of BigDecimal
>  
> {code:java}
> Arrow::VERSION
> => "5.0.0"{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-14518) [Ruby] ArrayBuilder doesn't work correctly with Decimal

2021-10-30 Thread Kenta Murata (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-14518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17436352#comment-17436352
 ] 

Kenta Murata edited comment on ARROW-14518 at 10/30/21, 5:39 PM:
-

A {{BigDecimal}} number manages only its precision in its decimal notation.

We can get the precision of a {{BigDecimal}} number by {{BigDecimal#precision}} 
method.

{{irb(main):001:0> BigDecimal("1.1").precision}}
{{=> 2}}

But, the {{BigDecimal#precision}} method doesn't count the trailing zeros.

{{irb(main):002:0> BigDecimal("1.10").precision}}
{{=> 2}}

The reason why a {{BigDecimal}} number doesn't have the scale property may be a 
{{BigDecimal}} number isn't a fixed-precision number.


was (Author: mrkn):
A {{BigDecimal}} number manages only its precision in its decimal notation.

We can get the precision of a {{BigDecimal}} number by {{BigDecimal#precision}} 
method.

{{irb(main):001:0> BigDecimal("1.1").precision}}
{{=> 2}}

But, the {{BigDecimal#precision}} method does not count the trailing zeros.

{{irb(main):002:0> BigDecimal("1.10").precision}}
{{=> 2}}

The reason why a {{BigDecimal}} number doesn't have the scale property may be a 
{{BigDecimal}} number isn't a fixed-precision number.

> [Ruby] ArrayBuilder doesn't work correctly with Decimal
> ---
>
> Key: ARROW-14518
> URL: https://issues.apache.org/jira/browse/ARROW-14518
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Ruby
>Reporter: Kanstantsin Ilchanka
>Priority: Minor
>
> When trying to convert raw data with decimal values to Arrow::Table error 
> received
>  
> {code:java}
> Arrow::Table.new(x: [BigDecimal('1.1')])
> ArgumentError: wrong arguments: Arrow::Decimal128ArrayBuilder#initialize(): 
> available signatures: (data_type: 
> interface(Arrow::Decimal128DataType(GArrowDecimal128DataType)))
> {code}
> I guess this is because Decimal128ArrayBuilder expects Decimal128DataType in 
> initialiser, however I'm not sure how to correctly and effectively detect 
> precision and scale from array of BigDecimal
>  
> {code:java}
> Arrow::VERSION
> => "5.0.0"{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-14518) [Ruby] ArrayBuilder doesn't work correctly with Decimal

2021-10-30 Thread Kenta Murata (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-14518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17436352#comment-17436352
 ] 

Kenta Murata commented on ARROW-14518:
--

A {{BigDecimal}} number manages only its precision in its decimal notation.

We can get the precision of a {{BigDecimal}} number by {{BigDecimal#precision}} 
method.

{{irb(main):001:0> BigDecimal("1.1").precision}}
{{=> 2}}

But, a {{BigDecimal#precision}} does not count the trailing zeros.

{{irb(main):002:0> BigDecimal("1.10").precision}}
{{=> 2}}

The reason why a {{BigDecimal}} number doesn't have the scale property may be a 
{{BigDecimal}} number isn't a fixed-precision number.

> [Ruby] ArrayBuilder doesn't work correctly with Decimal
> ---
>
> Key: ARROW-14518
> URL: https://issues.apache.org/jira/browse/ARROW-14518
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Ruby
>Reporter: Kanstantsin Ilchanka
>Priority: Minor
>
> When trying to convert raw data with decimal values to Arrow::Table error 
> received
>  
> {code:java}
> Arrow::Table.new(x: [BigDecimal('1.1')])
> ArgumentError: wrong arguments: Arrow::Decimal128ArrayBuilder#initialize(): 
> available signatures: (data_type: 
> interface(Arrow::Decimal128DataType(GArrowDecimal128DataType)))
> {code}
> I guess this is because Decimal128ArrayBuilder expects Decimal128DataType in 
> initialiser, however I'm not sure how to correctly and effectively detect 
> precision and scale from array of BigDecimal
>  
> {code:java}
> Arrow::VERSION
> => "5.0.0"{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-14518) [Ruby] ArrayBuilder doesn't work correctly with Decimal

2021-10-30 Thread Kenta Murata (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-14518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17436352#comment-17436352
 ] 

Kenta Murata edited comment on ARROW-14518 at 10/30/21, 5:38 PM:
-

A {{BigDecimal}} number manages only its precision in its decimal notation.

We can get the precision of a {{BigDecimal}} number by {{BigDecimal#precision}} 
method.

{{irb(main):001:0> BigDecimal("1.1").precision}}
{{=> 2}}

But, the {{BigDecimal#precision}} method does not count the trailing zeros.

{{irb(main):002:0> BigDecimal("1.10").precision}}
{{=> 2}}

The reason why a {{BigDecimal}} number doesn't have the scale property may be a 
{{BigDecimal}} number isn't a fixed-precision number.


was (Author: mrkn):
A {{BigDecimal}} number manages only its precision in its decimal notation.

We can get the precision of a {{BigDecimal}} number by {{BigDecimal#precision}} 
method.

{{irb(main):001:0> BigDecimal("1.1").precision}}
{{=> 2}}

But, a {{BigDecimal#precision}} does not count the trailing zeros.

{{irb(main):002:0> BigDecimal("1.10").precision}}
{{=> 2}}

The reason why a {{BigDecimal}} number doesn't have the scale property may be a 
{{BigDecimal}} number isn't a fixed-precision number.

> [Ruby] ArrayBuilder doesn't work correctly with Decimal
> ---
>
> Key: ARROW-14518
> URL: https://issues.apache.org/jira/browse/ARROW-14518
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Ruby
>Reporter: Kanstantsin Ilchanka
>Priority: Minor
>
> When trying to convert raw data with decimal values to Arrow::Table error 
> received
>  
> {code:java}
> Arrow::Table.new(x: [BigDecimal('1.1')])
> ArgumentError: wrong arguments: Arrow::Decimal128ArrayBuilder#initialize(): 
> available signatures: (data_type: 
> interface(Arrow::Decimal128DataType(GArrowDecimal128DataType)))
> {code}
> I guess this is because Decimal128ArrayBuilder expects Decimal128DataType in 
> initialiser, however I'm not sure how to correctly and effectively detect 
> precision and scale from array of BigDecimal
>  
> {code:java}
> Arrow::VERSION
> => "5.0.0"{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11974) [GLib] Add CsvFragmentScanOption support

2021-03-16 Thread Kenta Murata (Jira)
Kenta Murata created ARROW-11974:


 Summary: [GLib] Add CsvFragmentScanOption support
 Key: ARROW-11974
 URL: https://issues.apache.org/jira/browse/ARROW-11974
 Project: Apache Arrow
  Issue Type: New Feature
  Components: GLib
Reporter: Kenta Murata






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-11850) [GLib] GARROW_VERSION_0_16 macro is missing

2021-03-02 Thread Kenta Murata (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenta Murata resolved ARROW-11850.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 9623
[https://github.com/apache/arrow/pull/9623]

> [GLib] GARROW_VERSION_0_16 macro is missing
> ---
>
> Key: ARROW-11850
> URL: https://issues.apache.org/jira/browse/ARROW-11850
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: GLib
>Reporter: Kenta Murata
>Assignee: Kenta Murata
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {{GARROW_VERSION_0_16}} macro is missing in arrow-glib/version.h.
> The absence of this macro occurs the warning like below:
> {code}
> compiling ../../../../ext/arrow-nmatrix/arrow-nmatrix.c
> In file included from /opt/arrow-dbg/include/arrow-glib/arrow-glib.h:23,
>  from ../../../../ext/arrow-nmatrix/arrow-nmatrix.c:17:
> /opt/arrow-dbg/include/arrow-glib/version.h:297:36: warning: 
> "GARROW_VERSION_0_16" is not defined, evaluates to 0 [-Wundef]
>   297 | #if GARROW_VERSION_MIN_REQUIRED >= GARROW_VERSION_0_16
>   |^~~
> /opt/arrow-dbg/include/arrow-glib/version.h:305:34: warning: 
> "GARROW_VERSION_0_16" is not defined, evaluates to 0 [-Wundef]
>   305 | #if GARROW_VERSION_MAX_ALLOWED < GARROW_VERSION_0_16
>   |  ^~~
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11850) [GLib] GARROW_VERSION_0_16 macro is missing

2021-03-02 Thread Kenta Murata (Jira)
Kenta Murata created ARROW-11850:


 Summary: [GLib] GARROW_VERSION_0_16 macro is missing
 Key: ARROW-11850
 URL: https://issues.apache.org/jira/browse/ARROW-11850
 Project: Apache Arrow
  Issue Type: Bug
  Components: GLib
Reporter: Kenta Murata
Assignee: Kenta Murata


{{GARROW_VERSION_0_16}} macro is missing in arrow-glib/version.h.
The absence of this macro occurs the warning like below:


{code}
compiling ../../../../ext/arrow-nmatrix/arrow-nmatrix.c
In file included from /opt/arrow-dbg/include/arrow-glib/arrow-glib.h:23,
 from ../../../../ext/arrow-nmatrix/arrow-nmatrix.c:17:
/opt/arrow-dbg/include/arrow-glib/version.h:297:36: warning: 
"GARROW_VERSION_0_16" is not defined, evaluates to 0 [-Wundef]
  297 | #if GARROW_VERSION_MIN_REQUIRED >= GARROW_VERSION_0_16
  |^~~
/opt/arrow-dbg/include/arrow-glib/version.h:305:34: warning: 
"GARROW_VERSION_0_16" is not defined, evaluates to 0 [-Wundef]
  305 | #if GARROW_VERSION_MAX_ALLOWED < GARROW_VERSION_0_16
  |  ^~~
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-11782) [GLib][Dataset] Remove bindings for internal classes

2021-02-26 Thread Kenta Murata (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenta Murata reassigned ARROW-11782:


Assignee: Kenta Murata

> [GLib][Dataset] Remove bindings for internal classes
> 
>
> Key: ARROW-11782
> URL: https://issues.apache.org/jira/browse/ARROW-11782
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: GLib
>Affects Versions: 3.0.0
>Reporter: Ben Kietzman
>Assignee: Kenta Murata
>Priority: Major
> Fix For: 4.0.0
>
>
> GLib and ruby include bindings for internal classes such as ScanOptions, 
> ScanContext, InMemoryScanTask, ScanTask, ... These are probably unnecessary 
> and should be removed to present a simpler interface less prone to breakage 
> under refactoring of the wrapped classes 
> https://github.com/apache/arrow/pull/9532/checks?check_run_id=1974229719#step:8:2071



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11686) [C++]flight-test-integration-client sometimes exits by SIGABRT but does not print the stack trace

2021-02-17 Thread Kenta Murata (Jira)
Kenta Murata created ARROW-11686:


 Summary: [C++]flight-test-integration-client sometimes exits by 
SIGABRT but does not print the stack trace
 Key: ARROW-11686
 URL: https://issues.apache.org/jira/browse/ARROW-11686
 Project: Apache Arrow
  Issue Type: Test
  Components: C++
Reporter: Kenta Murata
Assignee: Kenta Murata


I found that flight-test-integration-client sometimes exits by SIGABRT.
This problem has been caused at the commit 
https://github.com/apache/arrow/commit/848c803bf162dca2e31cb63fbb3d2f9dbdda460e 
on the master branch, but the change in the commit seems unrelated to this 
problem.

To investigate this problem, I would like to let this command show the stack 
trace when it exits by SIGABRT.  It should be done by calling 
{{ArrowLog::InstallFailureSignalHandler}} function in the main function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11685) [C++] Typo in future_test.cc

2021-02-17 Thread Kenta Murata (Jira)
Kenta Murata created ARROW-11685:


 Summary: [C++] Typo in future_test.cc
 Key: ARROW-11685
 URL: https://issues.apache.org/jira/browse/ARROW-11685
 Project: Apache Arrow
  Issue Type: Task
  Components: C++
Reporter: Kenta Murata
Assignee: Kenta Murata


FutureStessTest in future_test.cc should be FutureStressTest.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11470) [C++] Overflow occurs on integer multiplications in ComputeRowMajorStrides, ComputeColumnMajorStrides, and CheckTensorStridesValidity

2021-02-02 Thread Kenta Murata (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenta Murata updated ARROW-11470:
-
Description: 
OSS-Fuzz reports the integer multiplication in ComputeRowMajorStrides function 
occurs overflow.

https://oss-fuzz.com/testcase-detail/623225726408

The same issue exists in ComputeColumnMajorStrides.

Moreover the similar overflow issue is occurred in CalculateValueOffset 
function called from CheckTensorStridesValidity function.

https://oss-fuzz.com/testcase-detail/6583463383793664

  was:
OSS-Fuzz reports the integer multiplication in ComputeRowMajorStrides function 
occurs overflow.

https://oss-fuzz.com/testcase-detail/623225726408

The same issue exists in ComputeColumnMajorStrides.


> [C++] Overflow occurs on integer multiplications in ComputeRowMajorStrides, 
> ComputeColumnMajorStrides, and CheckTensorStridesValidity
> -
>
> Key: ARROW-11470
> URL: https://issues.apache.org/jira/browse/ARROW-11470
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Kenta Murata
>Assignee: Kenta Murata
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> OSS-Fuzz reports the integer multiplication in ComputeRowMajorStrides 
> function occurs overflow.
> https://oss-fuzz.com/testcase-detail/623225726408
> The same issue exists in ComputeColumnMajorStrides.
> Moreover the similar overflow issue is occurred in CalculateValueOffset 
> function called from CheckTensorStridesValidity function.
> https://oss-fuzz.com/testcase-detail/6583463383793664



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11470) [C++] Overflow occurs on integer multiplications in ComputeRowMajorStrides, ComputeColumnMajorStrides, and CheckTensorStridesValidity

2021-02-02 Thread Kenta Murata (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenta Murata updated ARROW-11470:
-
Summary: [C++] Overflow occurs on integer multiplications in 
ComputeRowMajorStrides, ComputeColumnMajorStrides, and 
CheckTensorStridesValidity  (was: [C++] Overflow occurs on integer 
multiplications in Compute(Row|Column)MajorStrides)

> [C++] Overflow occurs on integer multiplications in ComputeRowMajorStrides, 
> ComputeColumnMajorStrides, and CheckTensorStridesValidity
> -
>
> Key: ARROW-11470
> URL: https://issues.apache.org/jira/browse/ARROW-11470
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Kenta Murata
>Assignee: Kenta Murata
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> OSS-Fuzz reports the integer multiplication in ComputeRowMajorStrides 
> function occurs overflow.
> https://oss-fuzz.com/testcase-detail/623225726408
> The same issue exists in ComputeColumnMajorStrides.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11470) [C++] Overflow occurs on integer multiplications in Compute(Row|Column)MajorStrides

2021-02-02 Thread Kenta Murata (Jira)
Kenta Murata created ARROW-11470:


 Summary: [C++] Overflow occurs on integer multiplications in 
Compute(Row|Column)MajorStrides
 Key: ARROW-11470
 URL: https://issues.apache.org/jira/browse/ARROW-11470
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Kenta Murata
Assignee: Kenta Murata


OSS-Fuzz reports the integer multiplication in ComputeRowMajorStrides function 
occurs overflow.

https://oss-fuzz.com/testcase-detail/623225726408

The same issue exists in ComputeColumnMajorStrides.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9642) [C++] Let MakeBuilder refer DictionaryType's index_type for deciding the starting bit width of the indices

2020-08-04 Thread Kenta Murata (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenta Murata updated ARROW-9642:

Summary: [C++] Let MakeBuilder refer DictionaryType's index_type for 
deciding the starting bit width of the indices  (was: [C++] Let MakeBuilder 
refer DictionaryType's index_type to detect the starting bit width of the 
indices)

> [C++] Let MakeBuilder refer DictionaryType's index_type for deciding the 
> starting bit width of the indices
> --
>
> Key: ARROW-9642
> URL: https://issues.apache.org/jira/browse/ARROW-9642
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Kenta Murata
>Assignee: Kenta Murata
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9642) [C++] Let MakeBuilder refer DictionaryType's index_type to detect the starting bit width of the indices

2020-08-04 Thread Kenta Murata (Jira)
Kenta Murata created ARROW-9642:
---

 Summary: [C++] Let MakeBuilder refer DictionaryType's index_type 
to detect the starting bit width of the indices
 Key: ARROW-9642
 URL: https://issues.apache.org/jira/browse/ARROW-9642
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Kenta Murata
Assignee: Kenta Murata






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9499) [C++] AdaptiveIntBuilder::AppendNull does not increment the null count

2020-07-15 Thread Kenta Murata (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenta Murata updated ARROW-9499:

Summary: [C++] AdaptiveIntBuilder::AppendNull does not increment the null 
count  (was: [C++] AdaptiveIntBuilder::null_count does not return the null 
count)

> [C++] AdaptiveIntBuilder::AppendNull does not increment the null count
> --
>
> Key: ARROW-9499
> URL: https://issues.apache.org/jira/browse/ARROW-9499
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Kenta Murata
>Assignee: Kenta Murata
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9499) [C++] AdaptiveIntBuilder::null_count does not return the null count

2020-07-15 Thread Kenta Murata (Jira)
Kenta Murata created ARROW-9499:
---

 Summary: [C++] AdaptiveIntBuilder::null_count does not return the 
null count
 Key: ARROW-9499
 URL: https://issues.apache.org/jira/browse/ARROW-9499
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Kenta Murata
Assignee: Kenta Murata






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9454) [GLib] Add binding of some dictionary builders

2020-07-13 Thread Kenta Murata (Jira)
Kenta Murata created ARROW-9454:
---

 Summary: [GLib] Add binding of some dictionary builders
 Key: ARROW-9454
 URL: https://issues.apache.org/jira/browse/ARROW-9454
 Project: Apache Arrow
  Issue Type: Improvement
  Components: GLib
Reporter: Kenta Murata
Assignee: Kenta Murata






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9331) [C++] Improve the performance of Tensor-to-SparseTensor conversion

2020-07-06 Thread Kenta Murata (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenta Murata updated ARROW-9331:

Description: In ARROW-9156, I rewrote Tensor-to-SparseTensor converters to 
reduce the library size. There was a drawback of that change, that is slowing 
down the conversion. We need additional change to improve conversion speed.

> [C++] Improve the performance of Tensor-to-SparseTensor conversion
> --
>
> Key: ARROW-9331
> URL: https://issues.apache.org/jira/browse/ARROW-9331
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Kenta Murata
>Assignee: Kenta Murata
>Priority: Major
>
> In ARROW-9156, I rewrote Tensor-to-SparseTensor converters to reduce the 
> library size. There was a drawback of that change, that is slowing down the 
> conversion. We need additional change to improve conversion speed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9331) [C++] Improve the performance of Tensor-to-SparseTensor conversion

2020-07-06 Thread Kenta Murata (Jira)
Kenta Murata created ARROW-9331:
---

 Summary: [C++] Improve the performance of Tensor-to-SparseTensor 
conversion
 Key: ARROW-9331
 URL: https://issues.apache.org/jira/browse/ARROW-9331
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Kenta Murata
Assignee: Kenta Murata






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9156) [C++] Reducing the code size of the tensor module

2020-06-16 Thread Kenta Murata (Jira)
Kenta Murata created ARROW-9156:
---

 Summary: [C++] Reducing the code size of the tensor module
 Key: ARROW-9156
 URL: https://issues.apache.org/jira/browse/ARROW-9156
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Kenta Murata
Assignee: Kenta Murata






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8970) [C++] Reduce shared library / binary code size (umbrella issue)

2020-06-16 Thread Kenta Murata (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138062#comment-17138062
 ] 

Kenta Murata commented on ARROW-8970:
-

[~wesm] There are some options to reduce code sizes related to tensor modules.

(1) Reducing inline functions.
(2) Reducing sparse tensor's index value types.  I guess we needn't support 
8-bit and 16-bit types for sparse tensor index.  At least scipy's sparse matrix 
doesn't use 8bit and 16bit types for sparse matrix indices.  We should 
investigate other libraries providing sparse tensors.
(3) Dropping functions for converting among dense and sparse tensors.  These 
functions aren't necessary to exchanging existing tensor data between systems.

I will start (1) and (2) right now.  If we don't need to provide conversion 
functions, I will also start (3).

> [C++] Reduce shared library / binary code size (umbrella issue)
> ---
>
> Key: ARROW-8970
> URL: https://issues.apache.org/jira/browse/ARROW-8970
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> We're reaching a point where we may need to be careful about decisions that 
> increase code size:
> * Instantiating too many templates for code that isn't performance sensitive, 
> or where some templates may do the same thing (e.g. Int32Type kernels may do 
> the same thing as a Date32Type kernel)
> * Inlining functions that don't need to be inline
> Code size tends to correlate also with compilation times, but not always.
> I'll use this umbrella issue to organize issues related to reducing compiled 
> code size
> At this moment (2020-05-27), here are the 25 largest object files in a -O2 
> build
> {code}
> 524896src/arrow/CMakeFiles/arrow_objlib.dir/array/builder_dict.cc.o
> 531920src/arrow/CMakeFiles/arrow_objlib.dir/filesystem/s3fs.cc.o
> 552000src/arrow/CMakeFiles/arrow_objlib.dir/json/converter.cc.o
> 575920src/arrow/CMakeFiles/arrow_objlib.dir/csv/converter.cc.o
> 595112
> src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_cast_string.cc.o
> 645728src/arrow/CMakeFiles/arrow_objlib.dir/type.cc.o
> 683040
> src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_set_lookup.cc.o
> 702232src/arrow/CMakeFiles/arrow_objlib.dir/ipc/reader.cc.o
> 729912src/arrow/CMakeFiles/arrow_objlib.dir/tensor/coo_converter.cc.o
> 752776src/arrow/CMakeFiles/arrow_objlib.dir/tensor/csc_converter.cc.o
> 752776src/arrow/CMakeFiles/arrow_objlib.dir/tensor/csr_converter.cc.o
> 877680src/arrow/CMakeFiles/arrow_objlib.dir/array/dict_internal.cc.o
> 885624src/arrow/CMakeFiles/arrow_objlib.dir/builder.cc.o
> 919072src/arrow/CMakeFiles/arrow_objlib.dir/scalar.cc.o
> 941776src/arrow/CMakeFiles/arrow_objlib.dir/ipc/json_internal.cc.o
> 1055248   src/arrow/CMakeFiles/arrow_objlib.dir/ipc/json_simple.cc.o
> 1233304   
> src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_compare.cc.o
> 1265160   src/arrow/CMakeFiles/arrow_objlib.dir/sparse_tensor.cc.o
> 1343480   src/arrow/CMakeFiles/arrow_objlib.dir/tensor/csf_converter.cc.o
> 1346928   src/arrow/CMakeFiles/arrow_objlib.dir/array.cc.o
> 1502568   
> src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_hash.cc.o
> 1609760   
> src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_cast_numeric.cc.o
> 1794416   src/arrow/CMakeFiles/arrow_objlib.dir/array/diff.cc.o
> 2759552   
> src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_filter.cc.o
> 7609432   
> src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_take.cc.o
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-4221) [Format] Add canonical flag in COO sparse index

2020-06-16 Thread Kenta Murata (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenta Murata reassigned ARROW-4221:
---

Assignee: Kenta Murata

> [Format] Add canonical flag in COO sparse index
> ---
>
> Key: ARROW-4221
> URL: https://issues.apache.org/jira/browse/ARROW-4221
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format
>Reporter: Kenta Murata
>Assignee: Kenta Murata
>Priority: Minor
>  Labels: sparse
> Fix For: 1.0.0
>
>
> To support the integration with scipy.sparse.coo_matrix, it is necessary to 
> add a flag in SparseCOOIndex.  This flag denotes whether elements in COO 
> sparse tensor is sorted lexicographically or not.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)