[GitHub] [tvm] tkonolige commented on a change in pull request #7334: [Relay, TOPI] Add numpy style cumsum op

2021-01-26 Thread GitBox


tkonolige commented on a change in pull request #7334:
URL: https://github.com/apache/tvm/pull/7334#discussion_r564640703



##
File path: python/tvm/topi/cuda/scan.py
##
@@ -251,99 +269,103 @@ def scan_thrust(data, output_dtype, exclusive=True, 
return_reduction=False):
 Whether or not do exclusive or inclusive scan.
 
 return_reduction: bool, optional
-Whether or not return a 1-D tensor storing the reduction of each row.
+Whether or not return a (N-1)-D tensor storing the reduction of each 
scan axis.
 Reductions are computed as part of the upsweep pass, so there is no 
extra cost.
-If False, reductions are ignored.
+If False, reductions are ignored. It must be False when exclusive is 
False.
+
+binop: function, optional
+A binary associative op to use for scan. Since we need to lookup the 
corresponding
+thrust function, arbitrariy callables are not supported. Currently only
+tvm.tir.generic.add can be passed in.
 
 Returns
 ---
 output : tvm.te.Tensor
-1-D tensor that is the exclusive scan of the input, or
-2-D tensor storing the exclusive scan of each row.
+A N-D tensor of the same rank N and shape as the input data.
 
 reduction : tvm.te.Tensor, optional
-1-D tensor storing the reduction of each row.
+(N-1)-D tensor storing the reduction of each scan axis.
 Returned if return_reduction is True.
 """
 data_buf = tvm.tir.decl_buffer(data.shape, data.dtype, "data_buf", 
data_alignment=8)
 output_buf = tvm.tir.decl_buffer(data.shape, output_dtype, "output_buf", 
data_alignment=8)
+
 output = te.extern(
 [data.shape],
 [data],
 lambda ins, outs: tvm.tir.call_packed(
-"tvm.contrib.thrust.sum_scan", ins[0], outs[0], exclusive
+_get_thrust_func_name(binop), ins[0], outs[0], exclusive
 ),
 dtype=[output_dtype],
 in_buffers=[data_buf],
 out_buffers=[output_buf],
-name="exclusive_sum_scan2d",
-tag="exclusive_sum_scan2d_gpu",
+name="exclusive_scan_thrust",
+tag="exclusive_scan_thrust_gpu",
 )
 
 if return_reduction:
 assert exclusive, "return_reduction should be False for inclusive scan"
-reduction = get_reduction_from_exclusive_scan(data, output)
+reduction = get_reduction_from_exclusive_scan(data, output, binop)
 return output, reduction
 
 return output
 
 
-def exclusive_scan(data, axis=-1, return_reduction=False, output_dtype=None):
-"""Do exclusive scan on 1D input or along rows of 2D input.
+def exclusive_scan(
+data, axis=-1, return_reduction=False, output_dtype=None, 
binop=tvm.tir.generic.add
+):
+"""Do exclusive scan on 1D or multidimensional input.
 
 Parameters
 --
 data : tvm.te.Tensor
-Input data. 1-D tensor with shape [scan_axis_size], or
-2-D tensor with shape [batch_size, scan_axis_size].
+Input data of any shape.
 
 axis: int, optional
-The axis to do scan on. For now, only the inner most axis is supported.
+The axis to do scan on. By default, scan is done on the innermost axis.
 
 return_reduction: bool, optional
-Whether or not return a 1-D tensor storing the reduction of each row.
+Whether or not return a tensor storing the reduction over each scan 
axis.
+If the input rank is N, this tensor is of rank N - 1.
 Reductions are computed as part of the upsweep pass, so there is no 
extra cost.
 If False, reductions are ignored.
 
 output_dtype: string, optional
 The dtype of the output scan tensor. If not provided, the dtype of the 
input is used.
 
+binop: function, optional

Review comment:
   I think you should say that this defaults to add.

##
File path: python/tvm/relay/op/transform.py
##
@@ -1320,3 +1320,50 @@ def adv_index(inputs):
 Output tensor.
 """
 return _make.adv_index(Tuple(inputs))
+
+
+def cumsum(data, axis=None, dtype=None):
+"""Numpy style cumsum op. Return the cumulative inclusive sum of the 
elements along
+a given axis.
+
+Parameters
+--
+data : relay.Expr
+The input data to the operator.
+
+axis : int, optional
+Axis along which the cumulative sum is computed. The default (None) is 
to compute
+the cumsum over the flattened array.
+
+dtype : string, optional
+Type of the returned array and of the accumulator in which the 
elements are summed.
+If dtype is not specified, it defaults to the dtype of data.
+
+Returns
+---
+result : relay.Expr
+The result has the same size as data, and the same shape as data if 
axis is not None.
+If axis is None, the result is a 1-d array.
+
+Examples:

Review comment:
   I think this formatting is necessary for rst?
   ```sug

[GitHub] [tvm] tkonolige commented on a change in pull request #7334: [Relay, TOPI] Add numpy style cumsum op

2021-01-25 Thread GitBox


tkonolige commented on a change in pull request #7334:
URL: https://github.com/apache/tvm/pull/7334#discussion_r563900347



##
File path: python/tvm/relay/op/transform.py
##
@@ -1320,3 +1320,28 @@ def adv_index(inputs):
 Output tensor.
 """
 return _make.adv_index(Tuple(inputs))
+
+
+def cumsum(data, axis=None, dtype=None):
+"""Numpy style cumsum op. Return the cumulative sum of the elements along 
a given axis.

Review comment:
   Could you document that this is inclusive.

##
File path: python/tvm/relay/op/transform.py
##
@@ -1320,3 +1320,28 @@ def adv_index(inputs):
 Output tensor.
 """
 return _make.adv_index(Tuple(inputs))
+
+
+def cumsum(data, axis=None, dtype=None):
+"""Numpy style cumsum op. Return the cumulative sum of the elements along 
a given axis.

Review comment:
   Could you document that this is inclusive or exclusive.

##
File path: python/tvm/relay/op/transform.py
##
@@ -1320,3 +1320,28 @@ def adv_index(inputs):
 Output tensor.
 """
 return _make.adv_index(Tuple(inputs))
+
+
+def cumsum(data, axis=None, dtype=None):
+"""Numpy style cumsum op. Return the cumulative sum of the elements along 
a given axis.
+
+Parameters
+--
+data : relay.Expr
+The input data to the operator.
+
+axis : int, optional
+Axis along which the cumulative sum is computed. The default (None) is 
to compute
+the cumsum over the flattened array.
+
+dtype : string, optional
+Type of the returned array and of the accumulator in which the 
elements are summed.
+If dtype is not specified, it defaults to the dtype of data.

Review comment:
   I think some examples would be useful here.

##
File path: python/tvm/topi/cuda/scan.py
##
@@ -19,30 +19,36 @@
 import tvm
 from tvm import te
 from tvm._ffi import get_global_func
-from ..transform import expand_dims, squeeze
-from ..utils import ceil_div
+from ..transform import expand_dims, squeeze, transpose, reshape
+from ..utils import ceil_div, swap, prod, get_const_int
 from ..math import cast
 from .. import tag
 from .injective import schedule_injective_from_existing
 
 
-def exclusive_sum_scan2d_ir(data, output, reduction=None):
+binop_name_to_func = {"sum": tvm.tir.generic.add}

Review comment:
   Instead of having a mapping from name to function, can we just pass the 
function directly?

##
File path: python/tvm/topi/cuda/scan.py
##
@@ -251,99 +263,98 @@ def scan_thrust(data, output_dtype, exclusive=True, 
return_reduction=False):
 Whether or not do exclusive or inclusive scan.
 
 return_reduction: bool, optional
-Whether or not return a 1-D tensor storing the reduction of each row.
+Whether or not return a (N-1)-D tensor storing the reduction of each 
scan axis.
 Reductions are computed as part of the upsweep pass, so there is no 
extra cost.
-If False, reductions are ignored.
+If False, reductions are ignored. It must be False when exclusive is 
False.
+
+biop: string, optional
+A string specifying which binary operator to use. Currently only "sum" 
is supported.
 
 Returns
 ---
 output : tvm.te.Tensor
-1-D tensor that is the exclusive scan of the input, or
-2-D tensor storing the exclusive scan of each row.
+A N-D tensor of the same rank N and shape as the input data.
 
 reduction : tvm.te.Tensor, optional
-1-D tensor storing the reduction of each row.
+(N-1)-D tensor storing the reduction of each scan axis.
 Returned if return_reduction is True.
 """
 data_buf = tvm.tir.decl_buffer(data.shape, data.dtype, "data_buf", 
data_alignment=8)
 output_buf = tvm.tir.decl_buffer(data.shape, output_dtype, "output_buf", 
data_alignment=8)
+binop_to_thrust_func_name = {"sum": "tvm.contrib.thrust.sum_scan"}
 output = te.extern(
 [data.shape],
 [data],
 lambda ins, outs: tvm.tir.call_packed(
-"tvm.contrib.thrust.sum_scan", ins[0], outs[0], exclusive
+binop_to_thrust_func_name[binop], ins[0], outs[0], exclusive
 ),
 dtype=[output_dtype],
 in_buffers=[data_buf],
 out_buffers=[output_buf],
-name="exclusive_sum_scan2d",
-tag="exclusive_sum_scan2d_gpu",
+name="exclusive_scan_thrust",
+tag="exclusive_scan_thrust_gpu",
 )
 
 if return_reduction:
 assert exclusive, "return_reduction should be False for inclusive scan"
-reduction = get_reduction_from_exclusive_scan(data, output)
+reduction = get_reduction_from_exclusive_scan(data, output, binop)
 return output, reduction
 
 return output
 
 
-def exclusive_scan(data, axis=-1, return_reduction=False, output_dtype=None):
-"""Do exclusive scan on 1D input or along rows of 2D input.
+def exclusive_scan(data, axis=-1

[GitHub] [tvm] tkonolige commented on a change in pull request #7334: [Relay, TOPI] Add numpy style cumsum op

2021-01-25 Thread GitBox


tkonolige commented on a change in pull request #7334:
URL: https://github.com/apache/tvm/pull/7334#discussion_r564055585



##
File path: python/tvm/topi/cuda/scan.py
##
@@ -251,99 +263,98 @@ def scan_thrust(data, output_dtype, exclusive=True, 
return_reduction=False):
 Whether or not do exclusive or inclusive scan.
 
 return_reduction: bool, optional
-Whether or not return a 1-D tensor storing the reduction of each row.
+Whether or not return a (N-1)-D tensor storing the reduction of each 
scan axis.
 Reductions are computed as part of the upsweep pass, so there is no 
extra cost.
-If False, reductions are ignored.
+If False, reductions are ignored. It must be False when exclusive is 
False.
+
+biop: string, optional
+A string specifying which binary operator to use. Currently only "sum" 
is supported.
 
 Returns
 ---
 output : tvm.te.Tensor
-1-D tensor that is the exclusive scan of the input, or
-2-D tensor storing the exclusive scan of each row.
+A N-D tensor of the same rank N and shape as the input data.
 
 reduction : tvm.te.Tensor, optional
-1-D tensor storing the reduction of each row.
+(N-1)-D tensor storing the reduction of each scan axis.
 Returned if return_reduction is True.
 """
 data_buf = tvm.tir.decl_buffer(data.shape, data.dtype, "data_buf", 
data_alignment=8)
 output_buf = tvm.tir.decl_buffer(data.shape, output_dtype, "output_buf", 
data_alignment=8)
+binop_to_thrust_func_name = {"sum": "tvm.contrib.thrust.sum_scan"}
 output = te.extern(
 [data.shape],
 [data],
 lambda ins, outs: tvm.tir.call_packed(
-"tvm.contrib.thrust.sum_scan", ins[0], outs[0], exclusive
+binop_to_thrust_func_name[binop], ins[0], outs[0], exclusive
 ),
 dtype=[output_dtype],
 in_buffers=[data_buf],
 out_buffers=[output_buf],
-name="exclusive_sum_scan2d",
-tag="exclusive_sum_scan2d_gpu",
+name="exclusive_scan_thrust",
+tag="exclusive_scan_thrust_gpu",
 )
 
 if return_reduction:
 assert exclusive, "return_reduction should be False for inclusive scan"
-reduction = get_reduction_from_exclusive_scan(data, output)
+reduction = get_reduction_from_exclusive_scan(data, output, binop)
 return output, reduction
 
 return output
 
 
-def exclusive_scan(data, axis=-1, return_reduction=False, output_dtype=None):
-"""Do exclusive scan on 1D input or along rows of 2D input.
+def exclusive_scan(data, axis=-1, return_reduction=False, output_dtype=None, 
binop="sum"):
+"""Do exclusive scan on 1D or multidimensional input.
 
 Parameters
 --
 data : tvm.te.Tensor
-Input data. 1-D tensor with shape [scan_axis_size], or
-2-D tensor with shape [batch_size, scan_axis_size].
+Input data of any shape.
 
 axis: int, optional
-The axis to do scan on. For now, only the inner most axis is supported.
+The axis to do scan on. By default, scan is done on the innermost axis.
 
 return_reduction: bool, optional
-Whether or not return a 1-D tensor storing the reduction of each row.
+Whether or not return a tensor storing the reduction over each scan 
axis.
+If the input rank is N, this tensor is of rank N - 1.
 Reductions are computed as part of the upsweep pass, so there is no 
extra cost.
 If False, reductions are ignored.
 
 output_dtype: string, optional
 The dtype of the output scan tensor. If not provided, the dtype of the 
input is used.
 
+biop: string, optional
+A string specifying which binary operator to use. Currently only "sum" 
is supported.
+
 Returns
 ---
 output : tvm.te.Tensor
-1-D tensor that is the exclusive scan of the input, or
-2-D tensor storing the exclusive scan of each row.
+A N-D tensor of the same rank N and shape as the input data.
 
 reduction : tvm.te.Tensor, optional
-1-D tensor storing the reduction of each row.
+(N-1)-D tensor storing the reduction of each scan axis.
 Returned if return_reduction is True.
 """
-# TODO(masahi): Support other binary operators
-ndim = len(data.shape)
-if axis < 0:
-axis += ndim
-assert axis == ndim - 1, "Only support scan on the inner most axis."
-
-if output_dtype is None:
-output_dtype = data.dtype
 
-target = tvm.target.Target.current()
-if target and target.kind.name == "cuda" and is_thrust_available():
-return scan_thrust(data, output_dtype, exclusive=True, 
return_reduction=return_reduction)
+def do_scan(data, output_dtype):
+target = tvm.target.Target.current()
+if target and target.kind.name == "cuda" and is_thrust_available():

Review comment:
   That makes sense. It's too bad

[GitHub] [tvm] tkonolige commented on a change in pull request #7334: [Relay, TOPI] Add numpy style cumsum op

2021-01-25 Thread GitBox


tkonolige commented on a change in pull request #7334:
URL: https://github.com/apache/tvm/pull/7334#discussion_r564052992



##
File path: python/tvm/topi/cuda/scan.py
##
@@ -19,30 +19,36 @@
 import tvm
 from tvm import te
 from tvm._ffi import get_global_func
-from ..transform import expand_dims, squeeze
-from ..utils import ceil_div
+from ..transform import expand_dims, squeeze, transpose, reshape
+from ..utils import ceil_div, swap, prod, get_const_int
 from ..math import cast
 from .. import tag
 from .injective import schedule_injective_from_existing
 
 
-def exclusive_sum_scan2d_ir(data, output, reduction=None):
+binop_name_to_func = {"sum": tvm.tir.generic.add}

Review comment:
   How about passing functions in, but using a mapping from function to 
thrust function. That way, when we add support for non-thrust code, we can just 
use the functions directly.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [tvm] tkonolige commented on a change in pull request #7334: [Relay, TOPI] Add numpy style cumsum op

2021-01-25 Thread GitBox


tkonolige commented on a change in pull request #7334:
URL: https://github.com/apache/tvm/pull/7334#discussion_r563925901



##
File path: python/tvm/relay/op/transform.py
##
@@ -1320,3 +1320,28 @@ def adv_index(inputs):
 Output tensor.
 """
 return _make.adv_index(Tuple(inputs))
+
+
+def cumsum(data, axis=None, dtype=None):
+"""Numpy style cumsum op. Return the cumulative sum of the elements along 
a given axis.
+
+Parameters
+--
+data : relay.Expr
+The input data to the operator.
+
+axis : int, optional
+Axis along which the cumulative sum is computed. The default (None) is 
to compute
+the cumsum over the flattened array.
+
+dtype : string, optional
+Type of the returned array and of the accumulator in which the 
elements are summed.
+If dtype is not specified, it defaults to the dtype of data.

Review comment:
   I think some examples would be useful here.

##
File path: python/tvm/topi/cuda/scan.py
##
@@ -19,30 +19,36 @@
 import tvm
 from tvm import te
 from tvm._ffi import get_global_func
-from ..transform import expand_dims, squeeze
-from ..utils import ceil_div
+from ..transform import expand_dims, squeeze, transpose, reshape
+from ..utils import ceil_div, swap, prod, get_const_int
 from ..math import cast
 from .. import tag
 from .injective import schedule_injective_from_existing
 
 
-def exclusive_sum_scan2d_ir(data, output, reduction=None):
+binop_name_to_func = {"sum": tvm.tir.generic.add}

Review comment:
   Instead of having a mapping from name to function, can we just pass the 
function directly?

##
File path: python/tvm/topi/cuda/scan.py
##
@@ -251,99 +263,98 @@ def scan_thrust(data, output_dtype, exclusive=True, 
return_reduction=False):
 Whether or not do exclusive or inclusive scan.
 
 return_reduction: bool, optional
-Whether or not return a 1-D tensor storing the reduction of each row.
+Whether or not return a (N-1)-D tensor storing the reduction of each 
scan axis.
 Reductions are computed as part of the upsweep pass, so there is no 
extra cost.
-If False, reductions are ignored.
+If False, reductions are ignored. It must be False when exclusive is 
False.
+
+biop: string, optional
+A string specifying which binary operator to use. Currently only "sum" 
is supported.
 
 Returns
 ---
 output : tvm.te.Tensor
-1-D tensor that is the exclusive scan of the input, or
-2-D tensor storing the exclusive scan of each row.
+A N-D tensor of the same rank N and shape as the input data.
 
 reduction : tvm.te.Tensor, optional
-1-D tensor storing the reduction of each row.
+(N-1)-D tensor storing the reduction of each scan axis.
 Returned if return_reduction is True.
 """
 data_buf = tvm.tir.decl_buffer(data.shape, data.dtype, "data_buf", 
data_alignment=8)
 output_buf = tvm.tir.decl_buffer(data.shape, output_dtype, "output_buf", 
data_alignment=8)
+binop_to_thrust_func_name = {"sum": "tvm.contrib.thrust.sum_scan"}
 output = te.extern(
 [data.shape],
 [data],
 lambda ins, outs: tvm.tir.call_packed(
-"tvm.contrib.thrust.sum_scan", ins[0], outs[0], exclusive
+binop_to_thrust_func_name[binop], ins[0], outs[0], exclusive
 ),
 dtype=[output_dtype],
 in_buffers=[data_buf],
 out_buffers=[output_buf],
-name="exclusive_sum_scan2d",
-tag="exclusive_sum_scan2d_gpu",
+name="exclusive_scan_thrust",
+tag="exclusive_scan_thrust_gpu",
 )
 
 if return_reduction:
 assert exclusive, "return_reduction should be False for inclusive scan"
-reduction = get_reduction_from_exclusive_scan(data, output)
+reduction = get_reduction_from_exclusive_scan(data, output, binop)
 return output, reduction
 
 return output
 
 
-def exclusive_scan(data, axis=-1, return_reduction=False, output_dtype=None):
-"""Do exclusive scan on 1D input or along rows of 2D input.
+def exclusive_scan(data, axis=-1, return_reduction=False, output_dtype=None, 
binop="sum"):
+"""Do exclusive scan on 1D or multidimensional input.
 
 Parameters
 --
 data : tvm.te.Tensor
-Input data. 1-D tensor with shape [scan_axis_size], or
-2-D tensor with shape [batch_size, scan_axis_size].
+Input data of any shape.
 
 axis: int, optional
-The axis to do scan on. For now, only the inner most axis is supported.
+The axis to do scan on. By default, scan is done on the innermost axis.
 
 return_reduction: bool, optional
-Whether or not return a 1-D tensor storing the reduction of each row.
+Whether or not return a tensor storing the reduction over each scan 
axis.
+If the input rank is N, this tensor is of rank N - 1.
 Red

[GitHub] [tvm] tkonolige commented on a change in pull request #7334: [Relay, TOPI] Add numpy style cumsum op

2021-01-25 Thread GitBox


tkonolige commented on a change in pull request #7334:
URL: https://github.com/apache/tvm/pull/7334#discussion_r563900347



##
File path: python/tvm/relay/op/transform.py
##
@@ -1320,3 +1320,28 @@ def adv_index(inputs):
 Output tensor.
 """
 return _make.adv_index(Tuple(inputs))
+
+
+def cumsum(data, axis=None, dtype=None):
+"""Numpy style cumsum op. Return the cumulative sum of the elements along 
a given axis.

Review comment:
   Could you document that this is inclusive or exclusive.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [tvm] tkonolige commented on a change in pull request #7334: [Relay, TOPI] Add numpy style cumsum op

2021-01-25 Thread GitBox


tkonolige commented on a change in pull request #7334:
URL: https://github.com/apache/tvm/pull/7334#discussion_r563900347



##
File path: python/tvm/relay/op/transform.py
##
@@ -1320,3 +1320,28 @@ def adv_index(inputs):
 Output tensor.
 """
 return _make.adv_index(Tuple(inputs))
+
+
+def cumsum(data, axis=None, dtype=None):
+"""Numpy style cumsum op. Return the cumulative sum of the elements along 
a given axis.

Review comment:
   Could you document that this is inclusive.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org