[GitHub] [incubator-mxnet] RogerChern commented on issue #14363: Support multi-threading for Custom Operator
RogerChern commented on issue #14363: Support multi-threading for Custom Operator URL: https://github.com/apache/incubator-mxnet/pull/14363#issuecomment-480155465 Great work! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] wkcn commented on issue #14624: mx.sym.contrib.MultiProposal produces all the same outputs
wkcn commented on issue #14624: mx.sym.contrib.MultiProposal produces all the same outputs URL: https://github.com/apache/incubator-mxnet/issues/14624#issuecomment-480152809 Since the input is invalid. I tried to use MultiPropsosal in Faster RCNN, and it worked. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] pengzhao-intel edited a comment on issue #14619: [Discussion] 1.5.0 Roadmap
pengzhao-intel edited a comment on issue #14619: [Discussion] 1.5.0 Roadmap URL: https://github.com/apache/incubator-mxnet/issues/14619#issuecomment-480110642 MKLDNN Quantization PR Name | PR# -- | -- sum | #14614 relu | #14604 refactor requantize | #14608 conv + activation | WIP conv1d enhance | WIP FC1d enhance | WIP cache op | WIP quantization flow to support 0 dim | WIP SSD COCO model | WIP multiply | Maybe repeat | Maybe split | Maybe expand_dim | Maybe FP32 optimization Name | PR# -- | -- transpose | #14545 RNN refactor with NNVM | #14476 reshape enhance | WIP sum1d | WIP slice1d | WIP MKL Math (ERF, mean, etc) | under discussion @eric-haibin-lin This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] pengzhao-intel commented on issue #14496: performance degradation from 1.3.1 to 1.4.0
pengzhao-intel commented on issue #14496: performance degradation from 1.3.1 to 1.4.0 URL: https://github.com/apache/incubator-mxnet/issues/14496#issuecomment-480141795 @eric-haibin-lin FYI, https://github.com/apache/incubator-mxnet/pull/14545#issuecomment-479921467 improved the performance of transpose a lot by MKLDNN. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] pilhoon opened a new issue #14624: mx.sym.contrib.MultiProposal produces all the same outputs
pilhoon opened a new issue #14624: mx.sym.contrib.MultiProposal produces all the same outputs URL: https://github.com/apache/incubator-mxnet/issues/14624 ## Description mx.sym.contrib.MultiProposal produces all the same outputs. It’s expected that various rois extracted by this method. ## Environment info (Required) --Python Info-- Version : 3.7.3 Compiler : GCC 7.3.0 Build: ('default', 'Mar 27 2019 22:11:17') Arch : ('64bit', '') Pip Info--- Version : 19.0.3 Directory: /home/x/anaconda3/lib/python3.7/site-packages/pip --MXNet Info--- Version : 1.4.0 Directory: /home/x/anaconda3/lib/python3.7/site-packages/mxnet Commit Hash : a03d59ed867ba334d78d61246a1090cd1868f5da --System Info-- Platform : Linux-3.10.0-693.21.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core system : Linux node : *** release : 3.10.0-693.21.1.el7.x86_64 version : #1 SMP Wed Mar 7 19:03:37 UTC 2018 --Hardware Info-- machine : x86_64 processor: x86_64 Architecture: x86_64 CPU op-mode(s):32-bit, 64-bit Byte Order:Little Endian CPU(s):20 On-line CPU(s) list: 0-19 Thread(s) per core:1 Core(s) per socket:10 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family:6 Model: 79 Model name:Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz Stepping: 1 CPU MHz: 1200.000 CPU max MHz: 2201. CPU min MHz: 1200. BogoMIPS: 4399.74 Virtualization:VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 25600K NUMA node0 CPU(s): 0-9 NUMA node1 CPU(s): 10-19 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_pt spec_ctrl ibpb_support tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts --Network Test-- Setting timeout: 10 Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0009 sec, LOAD: 0.8751 sec. Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.1852 sec, LOAD: 0.9262 sec. Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.1460 sec, LOAD: 0.7327 sec. Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.1050 sec, LOAD: 0.2781 sec. Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0372 sec, LOAD: 1.2177 sec. Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0046 sec, LOAD: 0.0518 sec. ## Package used (Python/R/Scala/Julia): python 3.7 ## Minimum reproducible example ```python import mxnet as mx from mxnet import gluon, nd prob = mx.sym.var('prob') pred = mx.sym.var('pred') im_info = mx.sym.var('im_info') feature_stride = 100 s = (1,3) r = (0.5,2) num_anchors = len(s)*len(r) rois = mx.sym.contrib.MultiProposal(cls_prob=prob, bbox_pred=pred, im_info=im_info, feature_stride=feature_stride, scales=s, ratios=r) net = gluon.SymbolBlock(inputs=[prob, pred, im_info], outputs=[rois]) net.initialize() w=5 # feature map widtha h=4 # feature map height prob = nd.random.uniform(0,1,(num_anchors,h,w)) prob = nd.expand_dims(nd.concat(prob, 1-prob, dim=0), axis=0) print('prob shape:',prob.shape) pred = nd.random.uniform(10,400,(1,4*num_anchors,h,w)) print('bbox shape:',pred.shape) im_info = nd.array([[3,h*feature_stride,w*feature_stride]]) a = net(prob, pred, im_info) print(a) ``` ## Steps to reproduce (Paste the commands you ran that produced the error.) 1. run the above code ## What have you tried to solve it? 1. view cpp source 2. run this on multiple machines This is an automated message from
[GitHub] [incubator-mxnet] pengzhao-intel edited a comment on issue #14619: [Discussion] 1.5.0 Roadmap
pengzhao-intel edited a comment on issue #14619: [Discussion] 1.5.0 Roadmap URL: https://github.com/apache/incubator-mxnet/issues/14619#issuecomment-480110642 MKLDNN Quantization PR Name | PR# -- | -- sum | #14614 relu | #14604 refactor requantize | #14608 conv + activation | WIP conv1d enhance | WIP FC1d enhance | WIP cache op | WIP quantization flow to support 0 dim | WIP SSD COCO model | WIP multiply | Maybe repeat | Maybe split | Maybe expand_dim | Maybe This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] wkcn edited a comment on issue #9686: [Discussion] MXNet 2.0 Roadmap (was: APIs that might be a good idea to break in 2.0)
wkcn edited a comment on issue #9686: [Discussion] MXNet 2.0 Roadmap (was: APIs that might be a good idea to break in 2.0) URL: https://github.com/apache/incubator-mxnet/issues/9686#issuecomment-480127095 I have some suggestions. 1. Custom Operator for deployment Currently, MXNet doesn't support the custom operator for deployment unless rebuilding from the source. Although there is the `MXCustomOpRegister` API, it is inconvenient and does not provide `mshadow::Stream`, which is important for executing an operator asynchronously. We need an approach (easy to write and compile) to support custom operator for deployment. 2. Refactor of the C++ package I hope that the syntax of the C++ package is the same as that of the Python package, and it will be more friendly for C++ users. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] wkcn commented on issue #9686: [Discussion] MXNet 2.0 Roadmap (was: APIs that might be a good idea to break in 2.0)
wkcn commented on issue #9686: [Discussion] MXNet 2.0 Roadmap (was: APIs that might be a good idea to break in 2.0) URL: https://github.com/apache/incubator-mxnet/issues/9686#issuecomment-480127095 I have some suggestions. 1. Custom Operator for deployment Currently, MXNet doesn't support the custom operator for deployment unless rebuilding from the source. Although there is the `MXCustomOpRegister` API, it is inconvenient and does not provide `mshadow::Stream`, which is important for execution an operator asynchronously. We need an approach (easy to write and compile) to support custom operator for deployment. 2. Refactor of the C++ package I hope that the syntax of the C++ package is the same as that of the Python package, and it will be more friendly for C++ users. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] wkcn commented on issue #14610: [Feature Request] Disable lazy evaluation
wkcn commented on issue #14610: [Feature Request] Disable lazy evaluation URL: https://github.com/apache/incubator-mxnet/issues/14610#issuecomment-480124120 It is interesting that the operator will be executed immediatly, and there is no lazy evaluation for `mx.nd.dot`. Close the issue. @lanking520 @eric-haibin-lin @anirudh2290 Thank you so much! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] wkcn closed issue #14610: [Feature Request] Disable lazy evaluation
wkcn closed issue #14610: [Feature Request] Disable lazy evaluation URL: https://github.com/apache/incubator-mxnet/issues/14610 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] wkcn commented on issue #14610: [Feature Request] Disable lazy evaluation
wkcn commented on issue #14610: [Feature Request] Disable lazy evaluation URL: https://github.com/apache/incubator-mxnet/issues/14610#issuecomment-480120964 @anirudh2290 Yes. Before my experiment, I was wrong to think the total time was op(T1) + sleep(T2). If the total time is T1 + T2, we should disable lazy evaluation to accelerate the execution. However, I did an experiment, and found that the operator execute parallelly with the sleep thread. It is confusing. I do not know when the operator executes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] eric-haibin-lin commented on issue #14496: performance degradation from 1.3.1 to 1.4.0
eric-haibin-lin commented on issue #14496: performance degradation from 1.3.1 to 1.4.0 URL: https://github.com/apache/incubator-mxnet/issues/14496#issuecomment-480118378 Is there any fix for transpose? I noticed now transpose takes significant amount of time in BERT. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-mxnet-site] branch asf-site updated: Bump the publish timestamp.
This is an automated email from the ASF dual-hosted git repository. zhasheng pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-mxnet-site.git The following commit(s) were added to refs/heads/asf-site by this push: new e250550 Bump the publish timestamp. e250550 is described below commit e250550f6c086aa4a4abd2ffaaf09520cbfff69a Author: mxnet-ci AuthorDate: Fri Apr 5 01:16:03 2019 + Bump the publish timestamp. --- date.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/date.txt b/date.txt new file mode 100644 index 000..6e32289 --- /dev/null +++ b/date.txt @@ -0,0 +1 @@ +Fri Apr 5 01:16:03 UTC 2019
[GitHub] [incubator-mxnet] anirudh2290 commented on issue #14610: [Feature Request] Disable lazy evaluation
anirudh2290 commented on issue #14610: [Feature Request] Disable lazy evaluation URL: https://github.com/apache/incubator-mxnet/issues/14610#issuecomment-480116403 @wkcn I am not sure I understand your question. As you showed from your numbers the op seems to be executing parallelly with the sleep thread. So when you execute op(T1) + sleep(T2) you dont see T1 + T2 but see somewhere close to T2 if T2 > T1. For example op execution time without sleep was 2 seconds. op execution time with sleep was 5 seconds. Also, op execution without sleep was 0.708 seconds and with 5 seconds sleep was close to 5 seconds. Are you asking if you can somehow avoid wait_to_read call. Yes you can avoid it but there is no guarantee that computation has finished unless you are using naive engine. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] pengzhao-intel edited a comment on issue #14619: [Discussion] 1.5.0 Roadmap
pengzhao-intel edited a comment on issue #14619: [Discussion] 1.5.0 Roadmap URL: https://github.com/apache/incubator-mxnet/issues/14619#issuecomment-480110642 MKLDNN Quantization PR Name | PR# -- | -- sum | #14614 relu | #14604 refactor requantize | #14608 conv + activation | WIP conv1d enhance | WIP cache op | WIP quantization flow to support 0 dim | WIP SSD COCO model | WIP multiply | Maybe repeat | Maybe split | Maybe expand_dim | Maybe This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] pengzhao-intel edited a comment on issue #14619: [Discussion] 1.5.0 Roadmap
pengzhao-intel edited a comment on issue #14619: [Discussion] 1.5.0 Roadmap URL: https://github.com/apache/incubator-mxnet/issues/14619#issuecomment-480110642 MKLDNN Quantization PR Name | PR# -- | -- sum | #14614 relu | #14604 refactor requantize | #14608 conv + activation | WIP conv1d enhance | WIP cache op | WIP SSD COCO model | WIP multiply | Maybe repeat | Maybe split | Maybe expand_dim | Maybe This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] larroy commented on issue #14601: [WIP] Add test for gemm overflow.
larroy commented on issue #14601: [WIP] Add test for gemm overflow. URL: https://github.com/apache/incubator-mxnet/pull/14601#issuecomment-480111314 @mxnet-label-bot add [pr-awaiting-review] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] pengzhao-intel edited a comment on issue #14619: [Discussion] 1.5.0 Roadmap
pengzhao-intel edited a comment on issue #14619: [Discussion] 1.5.0 Roadmap URL: https://github.com/apache/incubator-mxnet/issues/14619#issuecomment-480110642 MKLDNN Quantization PR Name | PR# -- | -- sum | #14614 relu | #14604 refactor requantize | #14608 conv + activation | WIP conv1d enhance | WIP cache op | WIP SSD COCO model | WIP This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] pengzhao-intel commented on issue #14619: [Discussion] 1.5.0 Roadmap
pengzhao-intel commented on issue #14619: [Discussion] 1.5.0 Roadmap URL: https://github.com/apache/incubator-mxnet/issues/14619#issuecomment-480110642 MKLDNN Quantization PR sum | #14614 -- | -- relu | #14604 refactor requantize | #14608 conv + activation | WIP cache op | WIP SSD COCO model | WIP This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] pengzhao-intel edited a comment on issue #14619: [Discussion] 1.5.0 Roadmap
pengzhao-intel edited a comment on issue #14619: [Discussion] 1.5.0 Roadmap URL: https://github.com/apache/incubator-mxnet/issues/14619#issuecomment-480110642 MKLDNN Quantization PR Name | PR# -- | -- sum | #14614 relu | #14604 refactor requantize | #14608 conv + activation | WIP cache op | WIP SSD COCO model | WIP This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] wkcn commented on issue #14610: [Feature Request] Disable lazy evaluation
wkcn commented on issue #14610: [Feature Request] Disable lazy evaluation URL: https://github.com/apache/incubator-mxnet/issues/14610#issuecomment-480110594 Test Code: ``` import mxnet as mx import time N = 4000 a = mx.nd.zeros((N, N)) b = mx.nd.zeros((N, N)) while 1: tic = time.time() c = mx.nd.dot(a, b) time.sleep(5) tic2 = time.time() c.wait_to_read() print("wait_to_read", time.time() - tic2) print(time.time() - tic) ``` Engine|calling `time.sleep(5)`|wait_to_read time|total time| --||| ThreadedEnginePerDevice|Yes|4e-5|5.01 ThreadedEnginePerDevice|No|0.705|0.708 NaiveEngine|Yes|4e-5|5.7 NaiveEngine|No|5.96e-6|0.68 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] eric-haibin-lin commented on issue #14610: [Feature Request] Disable lazy evaluation
eric-haibin-lin commented on issue #14610: [Feature Request] Disable lazy evaluation URL: https://github.com/apache/incubator-mxnet/issues/14610#issuecomment-480105752 MXNET_ENGINE_TYPE=NaiveEngine? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] karan6181 commented on a change in pull request #14587: [MXNET-1344, 1346][FIT API] Retrieve Batch size and Logging verbose support for Gluon fit() API
karan6181 commented on a change in pull request #14587: [MXNET-1344, 1346][FIT API] Retrieve Batch size and Logging verbose support for Gluon fit() API URL: https://github.com/apache/incubator-mxnet/pull/14587#discussion_r272406919 ## File path: python/mxnet/gluon/estimator/event_handler.py ## @@ -98,29 +104,33 @@ def train_end(self): pass def batch_begin(self): -self.batch_start = time.time() +if self.verbose == 2: Review comment: As per the discussion, I have added it as a constant so the end user can directly pass that as a parameter instead of number. Could you please have a look again? Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] karan6181 commented on a change in pull request #14587: [MXNET-1344, 1346][FIT API] Retrieve Batch size and Logging verbose support for Gluon fit() API
karan6181 commented on a change in pull request #14587: [MXNET-1344, 1346][FIT API] Retrieve Batch size and Logging verbose support for Gluon fit() API URL: https://github.com/apache/incubator-mxnet/pull/14587#discussion_r272406719 ## File path: python/mxnet/gluon/estimator/event_handler.py ## @@ -76,14 +76,20 @@ class LoggingHandler(EventHandler): file name to save the logs file_location: str file location to save the logs +verbose: int, default 1 +Limit the display level of training progress +verbose=0: display nothing(silent) Review comment: Removed the verbose `0` option as it makes no sense adding it. Thanks for your input This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] karan6181 commented on a change in pull request #14587: [MXNET-1344, 1346][FIT API] Retrieve Batch size and Logging verbose support for Gluon fit() API
karan6181 commented on a change in pull request #14587: [MXNET-1344, 1346][FIT API] Retrieve Batch size and Logging verbose support for Gluon fit() API URL: https://github.com/apache/incubator-mxnet/pull/14587#discussion_r272406569 ## File path: tests/python/unittest/test_gluon_estimator.py ## @@ -275,3 +278,44 @@ def test_context(): loss=loss, metrics=metrics, context='cpu') + + +def test_batch_size(): +'''Test batch size''' +num_samples = 32 + +# No Data Loader +data = mx.nd.random.uniform(shape=(num_samples, 3, 28, 28)) +label = mx.nd.random.randint(low=0, high=2, shape=(num_samples,)) +data_iter = mx.io.NDArrayIter(data=data, label=label, batch_size=16) +net = get_model() +loss = mx.gluon.loss.L2Loss() +ctx = mx.cpu() +est = estimator.Estimator(net=net, loss=loss, context=ctx) +with assert_raises(ValueError): +est.fit(train_data=data_iter) + +# Empty data loader +data = mx.nd.random.uniform(shape=(0,)) +label = mx.nd.random.randint(low=0, high=2, shape=(0,)) +batch_size = 2 +data_arr = mx.gluon.data.dataset.ArrayDataset(data, label) +data_loader = mx.gluon.data.DataLoader(data_arr, batch_size=batch_size) +est = estimator.Estimator(net=net, loss=loss, context=ctx) +with assert_raises(ValueError): +est.fit(train_data=data_loader) + +# Batch size less than context +ctx = [mx.gpu(i) for i in range(4)] +data = mx.nd.random.uniform(shape=(num_samples, 3, 28, 28)) +label = mx.nd.random.randint(low=0, high=2, shape=(num_samples,)) +batch_size = 2 +data_arr = mx.gluon.data.dataset.ArrayDataset(data, label) +data_loader = mx.gluon.data.DataLoader(data_arr, batch_size=batch_size) +est = estimator.Estimator(net=net, loss=loss, context=ctx) Review comment: Yes, fixed in all the test. Thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] karan6181 commented on a change in pull request #14587: [MXNET-1344, 1346][FIT API] Retrieve Batch size and Logging verbose support for Gluon fit() API
karan6181 commented on a change in pull request #14587: [MXNET-1344, 1346][FIT API] Retrieve Batch size and Logging verbose support for Gluon fit() API URL: https://github.com/apache/incubator-mxnet/pull/14587#discussion_r272406492 ## File path: python/mxnet/gluon/estimator/estimator.py ## @@ -241,16 +274,23 @@ def fit(self, train_data, from a data batch and load into contexts(devices) """ - self.epochs = epochs -if not batch_size: -batch_size = 32 * len(self.context) +num_batches, total_samples, batch_size = self._infer_data_info(train_data) + +if isinstance(self.context, list): Review comment: No longer relevant. Modified the designed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] karan6181 commented on a change in pull request #14587: [MXNET-1344, 1346][FIT API] Retrieve Batch size and Logging verbose support for Gluon fit() API
karan6181 commented on a change in pull request #14587: [MXNET-1344, 1346][FIT API] Retrieve Batch size and Logging verbose support for Gluon fit() API URL: https://github.com/apache/incubator-mxnet/pull/14587#discussion_r272406515 ## File path: python/mxnet/gluon/estimator/estimator.py ## @@ -175,14 +174,51 @@ def _batch_fn(self, batch, ctx, is_iterator=False): label = gluon.utils.split_and_load(label, ctx_list=ctx, batch_axis=0) return data, label +def _infer_data_info(self, data): +"""Retrieve the data information such as batch size, +Number of batches, and total number of samples + +Parameters +-- +data : DataLoader +A DataLoader instance with data and/or label + +Returns +--- +num_batches: int +Number of batches the data is divided into +total_samples: int +Total Number of samples +batch_size: int +Batch size +""" +if isinstance(data, gluon.data.DataLoader): +if isinstance(data._dataset, gluon.data.ArrayDataset): Review comment: No longer relevant. Modified the designed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] karan6181 commented on a change in pull request #14587: [MXNET-1344, 1346][FIT API] Retrieve Batch size and Logging verbose support for Gluon fit() API
karan6181 commented on a change in pull request #14587: [MXNET-1344, 1346][FIT API] Retrieve Batch size and Logging verbose support for Gluon fit() API URL: https://github.com/apache/incubator-mxnet/pull/14587#discussion_r272406509 ## File path: python/mxnet/gluon/estimator/estimator.py ## @@ -175,14 +174,51 @@ def _batch_fn(self, batch, ctx, is_iterator=False): label = gluon.utils.split_and_load(label, ctx_list=ctx, batch_axis=0) return data, label +def _infer_data_info(self, data): +"""Retrieve the data information such as batch size, +Number of batches, and total number of samples + +Parameters +-- +data : DataLoader +A DataLoader instance with data and/or label + +Returns +--- +num_batches: int +Number of batches the data is divided into +total_samples: int +Total Number of samples +batch_size: int +Batch size +""" +if isinstance(data, gluon.data.DataLoader): +if isinstance(data._dataset, gluon.data.ArrayDataset): +total_samples = data._dataset._data[0].shape[0] +elif isinstance(data._dataset, nd.ndarray.NDArray): Review comment: No longer relevant. Modified the designed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] karan6181 commented on a change in pull request #14587: [MXNET-1344, 1346][FIT API] Retrieve Batch size and Logging verbose support for Gluon fit() API
karan6181 commented on a change in pull request #14587: [MXNET-1344, 1346][FIT API] Retrieve Batch size and Logging verbose support for Gluon fit() API URL: https://github.com/apache/incubator-mxnet/pull/14587#discussion_r272406229 ## File path: tests/python/unittest/test_gluon_estimator.py ## @@ -260,7 +251,8 @@ def test_context(): loss=loss, metrics=metrics) # input list of context -ctx = [mx.gpu(0), mx.gpu(1)] +ctx = [mx.cpu() for _ in range(2)] Review comment: Done. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] wkcn edited a comment on issue #14615: Add PushAsyncPtr and PushSyncPtr APIs in engine
wkcn edited a comment on issue #14615: Add PushAsyncPtr and PushSyncPtr APIs in engine URL: https://github.com/apache/incubator-mxnet/pull/14615#issuecomment-480093964 @yuxihu Could we add a callback function, which will be called to release the resource when the execution is finished? There is an example in TVM project: https://github.com/dmlc/tvm/blob/master/src/runtime/c_runtime_api.cc#L443 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] stu1130 opened a new pull request #14623: [WIP][Dependency Update] Upgrade the libtiff to 4.0.10
stu1130 opened a new pull request #14623: [WIP][Dependency Update] Upgrade the libtiff to 4.0.10 URL: https://github.com/apache/incubator-mxnet/pull/14623 ## Description ## Upgrade the libtiff package to **4.0.10** due to lots of issues at 4.0.9. 1. [tif_jbig.c JBIGDecode out-of-bounds write](https://gitlab.com/libtiff/libtiff/merge_requests/38) 2. [two out-of-bounds writes in cpTags in tools/tiff2bw.c and tools/pal2rgb.c](https://gitlab.com/libtiff/libtiff/merge_requests/33/diffs?commit_id=f1b94e8a3ba49febdd3361c0214a1d1149251577) Please find more on [CVE](https://www.cvedetails.com/vulnerability-list/vendor_id-2224/Libtiff.html) ## Checklist ## ### Essentials ### - [ ] Test build with Ubuntu 14.04 - [ ] Test build with Ubuntu 16.04 ### Changes ### * gitlab didn't provide version 4.0.10 zip file so use mirror site from the [official website](http://www.libtiff.org/) ## Comments ## This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] wkcn commented on issue #14615: Add PushAsyncPtr and PushSyncPtr APIs in engine
wkcn commented on issue #14615: Add PushAsyncPtr and PushSyncPtr APIs in engine URL: https://github.com/apache/incubator-mxnet/pull/14615#issuecomment-480093964 @yuxihu Could we add a callback function, which will be called to release the resource when the execution is finished? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] abhinavs95 commented on issue #14373: Passing parameters to HybridBlocks and not using them
abhinavs95 commented on issue #14373: Passing parameters to HybridBlocks and not using them URL: https://github.com/apache/incubator-mxnet/issues/14373#issuecomment-480092913 I am trying to figure out if this is actually a bug and if there is a possible workaround for this usecase. @sandeep-krishnamurthy @safrooze Could you please have a look? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] abhinavs95 commented on issue #14585: Fix aspect ratio sampling for RandomResizedCrop
abhinavs95 commented on issue #14585: Fix aspect ratio sampling for RandomResizedCrop URL: https://github.com/apache/incubator-mxnet/pull/14585#issuecomment-480089688 @szha could you have a look and merge? Thanks @mxnet-label-bot update [pr-awaiting-merge] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] reminisce opened a new pull request #14622: [DO NOT MERGE] Test numpy branch CI
reminisce opened a new pull request #14622: [DO NOT MERGE] Test numpy branch CI URL: https://github.com/apache/incubator-mxnet/pull/14622 **DO NOT MERGE** For the purpose of running CI only. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] piyushghai commented on a change in pull request #14592: Add BERT QA Scala/Java example
piyushghai commented on a change in pull request #14592: Add BERT QA Scala/Java example URL: https://github.com/apache/incubator-mxnet/pull/14592#discussion_r272391955 ## File path: scala-package/examples/src/main/java/org/apache/mxnetexamples/javaapi/infer/bert/BertQA.java ## @@ -0,0 +1,148 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.mxnetexamples.javaapi.infer.bert; + +import org.apache.mxnet.infer.javaapi.Predictor; +import org.apache.mxnet.javaapi.*; +import org.kohsuke.args4j.CmdLineParser; +import org.kohsuke.args4j.Option; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.*; + +/** + * This is an example of using BERT to do the general Question and Answer inference jobs + * Users can provide a question with a paragraph contains answer to the model and + * the model will be able to find the best answer from the answer paragraph + */ +public class BertQA { +@Option(name = "--model-path-prefix", usage = "input model directory and prefix of the model") +private String modelPathPrefix = "/model/static_bert_qa"; +@Option(name = "--model-epoch", usage = "Epoch number of the model") +private int epoch = 2; +@Option(name = "--model-vocab", usage = "the vocabulary used in the model") +private String modelVocab = "/model/vocab.json"; +@Option(name = "--input-question", usage = "the input question") +private String inputQ = "When did BBC Japan start broadcasting?"; +@Option(name = "--input-answer", usage = "the input answer") +private String inputA = +"BBC Japan was a general entertainment Channel.\n" + +" Which operated between December 2004 and April 2006.\n" + +"It ceased operations after its Japanese distributor folded."; +@Option(name = "--seq-length", usage = "the maximum length of the sequence") +private int seqLength = 384; + +private final static Logger logger = LoggerFactory.getLogger(BertQA.class); +private static NDArray$ NDArray = NDArray$.MODULE$; + +private static int argmax(float[] prob) { +int maxIdx = 0; +for (int i = 0; i < prob.length; i++) { +if (prob[maxIdx] < prob[i]) maxIdx = i; +} +return maxIdx; +} + +/** + * Do the post processing on the output, apply softmax to get the probabilities + * reshape and get the most probable index + * @param result prediction result + * @param tokens word tokens + * @return Answers clipped from the original paragraph + */ +static List postProcessing(NDArray result, List tokens) { +NDArray[] output = NDArray.split( +NDArray.new splitParam(result, 2).setAxis(2)); +// Get the formatted logits result +NDArray startLogits = output[0].reshape(new int[]{0, -3}); +NDArray endLogits = output[1].reshape(new int[]{0, -3}); +// Get Probability distribution +float[] startProb = NDArray.softmax( +NDArray.new softmaxParam(startLogits))[0].toArray(); +float[] endProb = NDArray.softmax( +NDArray.new softmaxParam(endLogits))[0].toArray(); +int startIdx = argmax(startProb); +int endIdx = argmax(endProb); +return tokens.subList(startIdx, endIdx + 1); +} + +public static void main(String[] args) throws Exception{ +BertQA inst = new BertQA(); +CmdLineParser parser = new CmdLineParser(inst); +parser.parseArgument(args); +BertDataParser util = new BertDataParser(); Review comment: nit : ```util --> dataparser ``` just a more meaningful variable name :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] lanking520 commented on issue #7653: ipython, The kernel appears to have died. It will restart automatically.
lanking520 commented on issue #7653: ipython, The kernel appears to have died. It will restart automatically. URL: https://github.com/apache/incubator-mxnet/issues/7653#issuecomment-480085394 Close this issue due to the inactivity. Please try with the up-to-date package and use blas version 0.33+ in your build. Please feel free to comment or reopen it if you are still facing the same problems. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] lanking520 closed issue #7653: ipython, The kernel appears to have died. It will restart automatically.
lanking520 closed issue #7653: ipython, The kernel appears to have died. It will restart automatically. URL: https://github.com/apache/incubator-mxnet/issues/7653 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] lanking520 commented on issue #7591: Error:include/mxnet/./ndarray.h:109: Unknown type enum 1794200432
lanking520 commented on issue #7591: Error:include/mxnet/./ndarray.h:109: Unknown type enum 1794200432 URL: https://github.com/apache/incubator-mxnet/issues/7591#issuecomment-480084992 @lance-cs-wz @tianhaijie Have you tried the most up to date package? Are you still facing the problems? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] lanking520 commented on issue #7044: Error compiling mxnet
lanking520 commented on issue #7044: Error compiling mxnet URL: https://github.com/apache/incubator-mxnet/issues/7044#issuecomment-480084390 @BenEngbers Are you still facing the same problem? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] lanking520 commented on issue #6917: ImportError: You used to compile with protoc --python_out=./ ./caffe.proto
lanking520 commented on issue #6917: ImportError: You used to compile with protoc --python_out=./ ./caffe.proto URL: https://github.com/apache/incubator-mxnet/issues/6917#issuecomment-480083908 Close this issue due to the inactivity. Currently we have full support on ONNX where you can translate your model to this type and used in MXNet. Please feel free to reopen it if you are still facing the problems or would like to improve the script. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] lanking520 closed issue #6917: ImportError: You used to compile with protoc --python_out=./ ./caffe.proto
lanking520 closed issue #6917: ImportError: You used to compile with protoc --python_out=./ ./caffe.proto URL: https://github.com/apache/incubator-mxnet/issues/6917 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] aaronmarkham commented on a change in pull request #14592: Add BERT QA Scala/Java example
aaronmarkham commented on a change in pull request #14592: Add BERT QA Scala/Java example URL: https://github.com/apache/incubator-mxnet/pull/14592#discussion_r272388796 ## File path: scala-package/examples/src/main/java/org/apache/mxnetexamples/javaapi/infer/bert/README.md ## @@ -0,0 +1,101 @@ + + + + + + + + + + + + + + + + + +# Run BERT QA model using Java Inference API + +In this tutorial, we will walk through the BERT QA model trained by MXNet. +You will be able to run inference with general Q & A task: + +```text +Q: When did BBC Japan start broadcasting? +``` + +The model are expected to find the right answer in the corresponding text: +```text +BBC Japan was a general entertainment Channel. Which operated between December 2004 and April 2006. +It ceased operations after its Japanese distributor folded. +``` +And it picked up the right one: +```text +A: December 2004 +``` + +## Setup Guide + +### Step 1: Download the model + +For this tutorial, you can get the model and vocabulary by running following bash file. This script will use `wget` to download these artifacts from AWS S3. + +From the `scala-package/examples/scripts/infer/bert/` folder run: + +```bash +./get_bert_data.sh +``` + +### Step 2: Setup data path of the model + +### Setup Datapath and Parameters + +The available arguments are as follows: + +| Argument | Comments | +| - | | +| `--model-path-prefix` | Folder path with prefix to the model (including json, params). | +| `--model-vocab` | Vocabulary path | +| `--model-epoch` | Epoch number of the model | +| `--input-question` | Question that asked to the model | +| `--input-answer`| Paragraph that contains the answer | +| `--seq-length` | Sequence Length of the model (384 by default) | + +### Step 3: Run Inference +After the previous steps, you should be able to run the code using the following script that will pass all of the required parameters to the Infer API. + +From the `scala-package/examples/scripts/infer/bert/` folder run: + +```bash +./run_bert_qa_example.sh --model-path-prefix ../models/static-bert-qa/static_bert_qa \ + --model-vocab ../models/static-bert-qa/vocab.json \ + --model-epoch 2 +``` + +## Background + +To learn more about how BERT works in MXNet, please follow this [tutorial](https://medium.com/apache-mxnet/gluon-nlp-bert-6a489bdd3340). + +The model was extracted from the GluonNLP with static length settings. + +[Download link for the script](https://gluon-nlp.mxnet.io/_downloads/bert.zip) + +The original description can be found in [here](https://gluon-nlp.mxnet.io/model_zoo/bert/index.html#bert-base-on-squad-1-1). +```bash +python static_finetune_squad.py --optimizer adam --accumulate 2 --batch_size 6 --lr 3e-5 --epochs 2 --gpu 0 --export + +``` +This script would generate a `json` and `param` which are the standard MXNet model files. +By default, this model are using `bert_12_768_12` model with extra layers for QA jobs. + +After that, to be able to use it in Java, we need to export the dictionary from the script to parse the text +to actual indexes. Please add the following lines after [this line](https://github.com/dmlc/gluon-nlp/blob/master/scripts/bert/staticbert/static_finetune_squad.py#L262). Review comment: ok what if you quote the line - that way if it moves a bit, people still know where to work. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] stu1130 opened a new pull request #14621: [WIP][Dependency Update] Update libjpeg-turbo to 2.0.2
stu1130 opened a new pull request #14621: [WIP][Dependency Update] Update libjpeg-turbo to 2.0.2 URL: https://github.com/apache/incubator-mxnet/pull/14621 ## Description ## Upgrade the libjpeg-turbo to **2.0.2** due to the issue at 1.5.90 1. [divide by zero when processing a crafted BMP image](https://github.com/libjpeg-turbo/libjpeg-turbo/commit/43e84cff1bb2bd8293066f6ac4eb0df61bc6) Version 2.0.2 also fixes the performance issue > Fixed a severe performance issue in the Loongson MMI SIMD extensions that occurred when compressing RGB images whose image rows were not 64-bit-aligned. as stated in the [release note](https://github.com/libjpeg-turbo/libjpeg-turbo/releases) ## Checklist ## ### Essentials ### - [] Test build with Ubuntu 14.04 - [] Test build with Ubuntu 16.04 ### Changes ### ## Comments ## This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] roywei commented on a change in pull request #14587: [MXNET-1344, 1346][FIT API] Retrieve Batch size and Logging verbose support for Gluon fit() API
roywei commented on a change in pull request #14587: [MXNET-1344, 1346][FIT API] Retrieve Batch size and Logging verbose support for Gluon fit() API URL: https://github.com/apache/incubator-mxnet/pull/14587#discussion_r272281085 ## File path: python/mxnet/gluon/estimator/estimator.py ## @@ -175,14 +174,51 @@ def _batch_fn(self, batch, ctx, is_iterator=False): label = gluon.utils.split_and_load(label, ctx_list=ctx, batch_axis=0) return data, label +def _infer_data_info(self, data): +"""Retrieve the data information such as batch size, +Number of batches, and total number of samples + +Parameters +-- +data : DataLoader +A DataLoader instance with data and/or label + +Returns +--- +num_batches: int +Number of batches the data is divided into +total_samples: int +Total Number of samples +batch_size: int +Batch size +""" +if isinstance(data, gluon.data.DataLoader): +if isinstance(data._dataset, gluon.data.ArrayDataset): Review comment: many other Datset subclass has `._data` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] roywei commented on a change in pull request #14587: [MXNET-1344, 1346][FIT API] Retrieve Batch size and Logging verbose support for Gluon fit() API
roywei commented on a change in pull request #14587: [MXNET-1344, 1346][FIT API] Retrieve Batch size and Logging verbose support for Gluon fit() API URL: https://github.com/apache/incubator-mxnet/pull/14587#discussion_r272281252 ## File path: python/mxnet/gluon/estimator/estimator.py ## @@ -175,14 +174,51 @@ def _batch_fn(self, batch, ctx, is_iterator=False): label = gluon.utils.split_and_load(label, ctx_list=ctx, batch_axis=0) return data, label +def _infer_data_info(self, data): +"""Retrieve the data information such as batch size, +Number of batches, and total number of samples + +Parameters +-- +data : DataLoader +A DataLoader instance with data and/or label + +Returns +--- +num_batches: int +Number of batches the data is divided into +total_samples: int +Total Number of samples +batch_size: int +Batch size +""" +if isinstance(data, gluon.data.DataLoader): +if isinstance(data._dataset, gluon.data.ArrayDataset): +total_samples = data._dataset._data[0].shape[0] +elif isinstance(data._dataset, nd.ndarray.NDArray): Review comment: add some comments on why you are avoiding using `len(dataset)` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] lanking520 commented on issue #4159: How to bind 3 inputs using mxnet.io.NDArrayIter?
lanking520 commented on issue #4159: How to bind 3 inputs using mxnet.io.NDArrayIter? URL: https://github.com/apache/incubator-mxnet/issues/4159#issuecomment-480081968 Close this issue due to the inactivity. Please feel free to reopen this if you are still facing the same problems. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] lanking520 commented on issue #5487: "Check if MXNet is installed": Python import error: symbol not found: ___addtf3
lanking520 commented on issue #5487: "Check if MXNet is installed": Python import error: symbol not found: ___addtf3 URL: https://github.com/apache/incubator-mxnet/issues/5487#issuecomment-480081384 @PeterFelixNguyen Are you still facing this problems? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] anirudh2290 commented on a change in pull request #14173: [WIP] MXNet AMP (automatic mixed precision)
anirudh2290 commented on a change in pull request #14173: [WIP] MXNet AMP (automatic mixed precision) URL: https://github.com/apache/incubator-mxnet/pull/14173#discussion_r272388590 ## File path: python/mxnet/amp/lists/symbol.py ## @@ -0,0 +1,177 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +# coding: utf-8 +"""Lists of functions whitelisted/blacklisted for automatic mixed precision in symbol API.""" + +FP16_FUNCS = [ +'Convolution', +'Deconvolution', +'FullyConnected', +'RNN', +] + +FP32_FUNCS = [ Review comment: Can we add support in the API for updating the lists or using custom lists for the user ? User may want to experiment with certain ops being run in FP16 vs FP32 to see accuracy changes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] roywei commented on a change in pull request #14587: [MXNET-1344, 1346][FIT API] Retrieve Batch size and Logging verbose support for Gluon fit() API
roywei commented on a change in pull request #14587: [MXNET-1344, 1346][FIT API] Retrieve Batch size and Logging verbose support for Gluon fit() API URL: https://github.com/apache/incubator-mxnet/pull/14587#discussion_r272386502 ## File path: tests/python/unittest/test_gluon_estimator.py ## @@ -260,7 +251,8 @@ def test_context(): loss=loss, metrics=metrics) # input list of context -ctx = [mx.gpu(0), mx.gpu(1)] +ctx = [mx.cpu() for _ in range(2)] Review comment: change this ctx=[ list of gpu ctxs] if num_gpus >1 else [mx.cpu()] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] lanking520 closed issue #4159: How to bind 3 inputs using mxnet.io.NDArrayIter?
lanking520 closed issue #4159: How to bind 3 inputs using mxnet.io.NDArrayIter? URL: https://github.com/apache/incubator-mxnet/issues/4159 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] roywei commented on a change in pull request #14587: [MXNET-1344, 1346][FIT API] Retrieve Batch size and Logging verbose support for Gluon fit() API
roywei commented on a change in pull request #14587: [MXNET-1344, 1346][FIT API] Retrieve Batch size and Logging verbose support for Gluon fit() API URL: https://github.com/apache/incubator-mxnet/pull/14587#discussion_r272282374 ## File path: python/mxnet/gluon/estimator/estimator.py ## @@ -241,16 +274,23 @@ def fit(self, train_data, from a data batch and load into contexts(devices) """ - self.epochs = epochs -if not batch_size: -batch_size = 32 * len(self.context) +num_batches, total_samples, batch_size = self._infer_data_info(train_data) + +if isinstance(self.context, list): Review comment: we already made sure context is a list during context check in estimator __init__ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] abhinavs95 edited a comment on issue #14373: Passing parameters to HybridBlocks and not using them
abhinavs95 edited a comment on issue #14373: Passing parameters to HybridBlocks and not using them URL: https://github.com/apache/incubator-mxnet/issues/14373#issuecomment-479228739 @whamza15 This is not an issue of not using all variables in hybrid_forward as the following test works ``` import mxnet.gluon as gl import mxnet as mx class EmbeddingBlock(gl.HybridBlock): def __init__(self, num_toks, dim, **kwargs): super(EmbeddingBlock, self).__init__(**kwargs) self.emb = gl.nn.Embedding(num_toks, dim) def hybrid_forward(self, F, x, valid_length): # NOTE valid_length is not used return self.emb(x) net = EmbeddingBlock(10, 100) net.initialize() net.hybridize() x1 = mx.nd.array(range(8)).reshape(2,-1) vl1 = mx.nd.array([3,2]) x2 = mx.nd.array(range(8)).reshape(2,-1) vl2 = mx.nd.array([3,2]) net(x1, vl1) print(net.collect_params()) ``` EDIT: The above test works because deferred initialization is not used for embedding layers. For layers using deferred initialization like `nn.dense` the issue exists as can be verified using the following: ``` class Net(gl.HybridBlock): def __init__(self, **kwargs): super(Net, self).__init__(**kwargs) self.dense = gl.nn.Dense(3, flatten=False) def hybrid_forward(self, F, x, v1): return self.dense(x) net = Net() net.initialize() net.hybridize() x = mx.nd.array(range(8)).reshape(2,-1) v1 = mx.nd.array([3,2]) net(x, v1) ``` Error Message: ``` /anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py:540: UserWarning: The 1-th input to HybridBlock is not used by any computation. Is this intended? out = self.forward(*args) infer_shape error. Arguments: data0: (2, 4) data1: (2,) Traceback (most recent call last): File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py", line 803, in _call_cached_op for is_arg, i in self._cached_op_args] File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py", line 803, in for is_arg, i in self._cached_op_args] File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/parameter.py", line 494, in data return self._check_and_get(self._data, ctx) File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/parameter.py", line 208, in _check_and_get "num_features, etc., for network layers."%(self.name)) mxnet.gluon.parameter.DeferredInitializationError: Parameter 'dense0_weight' has not been initialized yet because initialization was deferred. Actual initialization happens during the first forward pass. Please pass one batch of data through the network before accessing Parameters. You can also avoid deferred initialization by specifying in_units, num_features, etc., for network layers. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py", line 789, in _deferred_infer_shape self.infer_shape(*args) File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py", line 862, in infer_shape self._infer_attrs('infer_shape', 'shape', *args) File "/anaconda3/lib/python3.7/site-packages/mxnet/gluon/block.py", line 851, in _infer_attrs **{i.name: getattr(j, attr) for i, j in zip(inputs, args)}) File "/anaconda3/lib/python3.7/site-packages/mxnet/symbol/symbol.py", line 996, in infer_shape res = self._infer_shape_impl(False, *args, **kwargs) File "/anaconda3/lib/python3.7/site-packages/mxnet/symbol/symbol.py", line 1126, in _infer_shape_impl ctypes.byref(complete))) File "/anaconda3/lib/python3.7/site-packages/mxnet/base.py", line 252, in check_call raise MXNetError(py_str(_LIB.MXGetLastError())) mxnet.base.MXNetError: [14:53:40] src/c_api/c_api_symbolic.cc:494: InferShapeKeyword argument name data1 not found. Candidate arguments: [0]data0 [1]dense0_weight [2]dense0_bias Stack trace returned 5 entries: [bt] (0) 0 libmxnet.so 0x00011164e390 std::__1::__tree, std::__1::allocator >, mxnet::NDArrayFunctionReg*>, std::__1::__map_value_compare, std::__1::allocator >, std::__1::__value_type, std::__1::allocator >, mxnet::NDArrayFunctionReg*>, std::__1::less, std::__1::allocator > >, true>, std::__1::allocator, std::__1::allocator >, mxnet::NDArrayFunctionReg*> > >::destroy(std::__1::__tree_node, std::__1::allocator >, mxnet::NDArrayFunctionReg*>, void*>*) + 2736 [bt] (1) 1 libmxnet.so 0x00011164e13f std::__1::__tree, std::__1::allocator >, mxnet::NDArrayFunctionReg*>, std::__1::__map_value_compare, std::__1::allocator >, std::__1::__value_type, std::__1::allocator >, mxnet::NDArrayFunctionReg*>, std::__1::
[GitHub] [incubator-mxnet] anirudh2290 commented on issue #14619: [Discussion] 1.5.0 Roadmap
anirudh2290 commented on issue #14619: [Discussion] 1.5.0 Roadmap URL: https://github.com/apache/incubator-mxnet/issues/14619#issuecomment-480074711 Thanks for starting this! I would like to include exception handling fixes: #14397 (@anirudh2290), #14433(@anirudh2290) , #14575 (@arcadiaphy). These three should be merged by end of next week hopefully. Conversion of FP32 models to mixed precision models (#14584) (Should be in by May first week tentatively). In addition, I have some changes to profiler to visualize gpu memory pooling and help make better decisions on the env variable choice. It is currently in a branch (https://github.com/anirudh2290/mxnet/tree/memory_profiler_poc2) and intend to open a PR soon (next week). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] yuxihu commented on a change in pull request #14615: Add PushAsyncPtr and PushSyncPtr APIs in engine
yuxihu commented on a change in pull request #14615: Add PushAsyncPtr and PushSyncPtr APIs in engine URL: https://github.com/apache/incubator-mxnet/pull/14615#discussion_r272351599 ## File path: src/engine/engine.cc ## @@ -67,4 +67,26 @@ Engine* Engine::Get() { static Engine *inst = _GetSharedRef().get(); return inst; } + +void Engine::PushAsyncPtr(AsyncFnPtr exec_fn_ptr, const std::shared_ptr& param, + Context exec_ctx, std::vector const& const_vars, + std::vector const& mutable_vars, + FnProperty prop, int priority, + const char* opr_name, bool wait) { + auto exec_fn = [exec_fn_ptr, param](RunContext rctx, Review comment: I don't think we can do a ref capture here. This is where we want to do a copy of the shared_ptr which increases its ref count. When the exec_fn_ptr is called in another thread, we need to make sure param is still valid This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] stu1130 opened a new pull request #14620: [WIP][Dependency Update] Upgrade the libpng to 1.6.35
stu1130 opened a new pull request #14620: [WIP][Dependency Update] Upgrade the libpng to 1.6.35 URL: https://github.com/apache/incubator-mxnet/pull/14620 ## Description ## Upgrade the libpng package to **1.6.35** due to following issues at 1.6.34. 1. [SEGV in function png_free_data](https://github.com/glennrp/libpng/issues/238) more on [link1](https://github.com/fouzhe/security/tree/master/libpng), [link2](http://www.oracle.com/technetwork/security-advisory/cpuoct2018-4428296.html) 2. [Division by zero causes LibPNG to crash](https://sourceforge.net/p/libpng/bugs/278/) more on [link](https://www.cvedetails.com/cve/CVE-2018-13785/) Not that the latest stable version is **1.6.36** but have one memory leak issue [memory leak in png_create_info_struct](https://github.com/glennrp/libpng/issues/269) ## Checklist ## ### Essentials ### Please feel free to remove inapplicable items for your PR. - [ ] Test build with Ubuntu 14.04 - [ ] Test build with Ubuntu 16.04 ### Changes ### ## Comments ## This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] eric-haibin-lin commented on issue #14619: [Discussion] 1.5.0 Roadmap
eric-haibin-lin commented on issue #14619: [Discussion] 1.5.0 Roadmap URL: https://github.com/apache/incubator-mxnet/issues/14619#issuecomment-480062264 Hi everyone, I've created v1.5.x branch here: https://github.com/apache/incubator-mxnet/tree/v1.5.x Before we have an agreement on the timeline and features, I will synchronize this branch with the master branch periodically. Once we have decided the code freeze day, we will only cherry-pick required changes/features to the branch then. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-mxnet] branch v1.5.x created (now 5f19362)
This is an automated email from the ASF dual-hosted git repository. haibin pushed a change to branch v1.5.x in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git. at 5f19362 fix tests (#14565) No new revisions were added by this update.
[incubator-mxnet-site] branch asf-site updated: Bump the publish timestamp.
This is an automated email from the ASF dual-hosted git repository. zhasheng pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-mxnet-site.git The following commit(s) were added to refs/heads/asf-site by this push: new 0e5b876 Bump the publish timestamp. 0e5b876 is described below commit 0e5b876ec80fdf5465e03abd045a30233bcf9b0d Author: mxnet-ci AuthorDate: Thu Apr 4 20:48:26 2019 + Bump the publish timestamp. --- date.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/date.txt b/date.txt new file mode 100644 index 000..5da6bbb --- /dev/null +++ b/date.txt @@ -0,0 +1 @@ +Thu Apr 4 20:48:26 UTC 2019
[GitHub] [incubator-mxnet] yuxihu commented on a change in pull request #14615: Add PushAsyncPtr and PushSyncPtr APIs in engine
yuxihu commented on a change in pull request #14615: Add PushAsyncPtr and PushSyncPtr APIs in engine URL: https://github.com/apache/incubator-mxnet/pull/14615#discussion_r272351599 ## File path: src/engine/engine.cc ## @@ -67,4 +67,26 @@ Engine* Engine::Get() { static Engine *inst = _GetSharedRef().get(); return inst; } + +void Engine::PushAsyncPtr(AsyncFnPtr exec_fn_ptr, const std::shared_ptr& param, + Context exec_ctx, std::vector const& const_vars, + std::vector const& mutable_vars, + FnProperty prop, int priority, + const char* opr_name, bool wait) { + auto exec_fn = [exec_fn_ptr, param](RunContext rctx, Review comment: I think we cannot do a ref capture here. This is where we want to do a copy of the shared_ptr which increases its ref count. When the exec_fn_ptr is called in another thread, we need to make sure param is still valid This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] drivanov commented on issue #14443: Mxnet allclose
drivanov commented on issue #14443: Mxnet allclose URL: https://github.com/apache/incubator-mxnet/pull/14443#issuecomment-480047448 I think that it is a good time to merge these changes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] samskalicky commented on issue #14570: [WIP] use a compile flag to use int64 tensor size
samskalicky commented on issue #14570: [WIP] use a compile flag to use int64 tensor size URL: https://github.com/apache/incubator-mxnet/pull/14570#issuecomment-480044526 @larroy the dimension can be -1 if its unknown, or if -1 refers to the last dimension. So we do have to use signed types. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] larroy edited a comment on issue #13472: [WIP][Don't merge] Comment out dmlc::SetEnv in pthread_atfork #13438
larroy edited a comment on issue #13472: [WIP][Don't merge] Comment out dmlc::SetEnv in pthread_atfork #13438 URL: https://github.com/apache/incubator-mxnet/pull/13472#issuecomment-480044194 I think we need to do more testing to be confident that the solution is solid. I can do it during my next oncall. The change might be trivial but checking for correctness in the multi-threaded scenario and with interactions from other libraries is costly in terms of developer time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] larroy commented on issue #13472: [WIP][Don't merge] Comment out dmlc::SetEnv in pthread_atfork #13438
larroy commented on issue #13472: [WIP][Don't merge] Comment out dmlc::SetEnv in pthread_atfork #13438 URL: https://github.com/apache/incubator-mxnet/pull/13472#issuecomment-480044194 I think we need to do more testing to be confident that the solution is solid. I can do it during my next oncall. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] larroy commented on a change in pull request #14570: [WIP] use a compile flag to use int64 tensor size
larroy commented on a change in pull request #14570: [WIP] use a compile flag to use int64 tensor size URL: https://github.com/apache/incubator-mxnet/pull/14570#discussion_r272345438 ## File path: include/mxnet/tensor_blob.h ## @@ -456,7 +456,7 @@ class FieldEntry this->enforce_nonzero_ = true; return this->self(); } - inline FieldEntry &set_expect_ndim(mxnet::index_t ndim) { + inline FieldEntry &set_expect_ndim(int ndim) { Review comment: I get your point. I'm not convinced that the dimensions type should be signed, as it makes no sense. Agree that the int width should be enough. I personally would make it unsigned. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] abhinavs95 commented on issue #13943: mxnet read jpeg images give different values compared with other libs
abhinavs95 commented on issue #13943: mxnet read jpeg images give different values compared with other libs URL: https://github.com/apache/incubator-mxnet/issues/13943#issuecomment-480039605 Hi @2void mxnet.image.imread() uses opencv imread under the hood. Using the snippet below we can check that the results from both are identical. (I am using mxnet 1.4.0) ``` import mxnet import numpy as np import cv2 filename = "test.jpeg" img_cv = cv2.cvtColor(cv2.imread(filename), cv2.COLOR_BGR2RGB) #need to convert as default is BGR for opencv img_mx = mxnet.image.imread(filename).asnumpy() print(np.where(img_cv != img_mx)) ``` The difference in output between opencv and skimage, from what I could gather, is due to different techniques of decoding the jpeg image (source: https://github.com/scikit-image/scikit-image/issues/2293). However, the output from both cannot be differentiated by human eye. If you want to ensure you have the exact same output as mxnet.image.imread, I would suggest using opencv. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] szha commented on issue #14619: [Discussion] 1.5.0 Roadmap
szha commented on issue #14619: [Discussion] 1.5.0 Roadmap URL: https://github.com/apache/incubator-mxnet/issues/14619#issuecomment-480038281 The changes since 1.4.0 release that are already merged in the master branch will be included in the 1.5.0 release. The list can be found at: https://github.com/apache/incubator-mxnet/compare/v1.4.x...master?expand=1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-mxnet-site] branch asf-site updated: Bump the publish timestamp.
This is an automated email from the ASF dual-hosted git repository. zhasheng pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-mxnet-site.git The following commit(s) were added to refs/heads/asf-site by this push: new ac918d6 Bump the publish timestamp. ac918d6 is described below commit ac918d629437e8d233d8c203567685bf484677e5 Author: mxnet-ci AuthorDate: Thu Apr 4 19:26:47 2019 + Bump the publish timestamp. --- date.txt | 1 + 1 file changed, 1 insertion(+) diff --git a/date.txt b/date.txt new file mode 100644 index 000..e7da823 --- /dev/null +++ b/date.txt @@ -0,0 +1 @@ +Thu Apr 4 19:26:47 UTC 2019
[GitHub] [incubator-mxnet] anirudh2290 commented on a change in pull request #14570: [WIP] use a compile flag to use int64 tensor size
anirudh2290 commented on a change in pull request #14570: [WIP] use a compile flag to use int64 tensor size URL: https://github.com/apache/incubator-mxnet/pull/14570#discussion_r272331656 ## File path: Makefile ## @@ -188,6 +188,11 @@ ifeq ($(USE_OPERATOR_TUNING), 1) CFLAGS += -DMXNET_USE_OPERATOR_TUNING=1 endif +ifeq ($(USE_INT64_TENSOR_SIZE), 1) + CFLAGS += -DMSHADOW_INT64_TENSOR_SIZE=1 Review comment: Ya probably doesn't fall under semver. We should still let users like dgl know about the change and how they can build for large tensor support. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] mxnet-label-bot commented on issue #14619: [Discussion] 1.5.0 Roadmap
mxnet-label-bot commented on issue #14619: [Discussion] 1.5.0 Roadmap URL: https://github.com/apache/incubator-mxnet/issues/14619#issuecomment-480029433 Hey, this is the MXNet Label Bot. Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it. Here are my recommended labels: Feature This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] szha opened a new issue #14619: [Discussion] 1.5.0 Roadmap
szha opened a new issue #14619: [Discussion] 1.5.0 Roadmap URL: https://github.com/apache/incubator-mxnet/issues/14619 Let's start a discussion here about the roadmap towards 1.5.0. We are looking for: - New features that are useful to your research - Improvements and patches to existing features If you have any item that you'd like to propose to have in the roadmap, please do: - Create (or locate existing) issue for the item, note the issue number. - Comment in this issue: 1) the above issue number, 2) one sentence of what the item is about and why it's useful to you. - Indicate whether you'd be willing to help out on the item. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] szha commented on issue #9686: MXNet 2.0 Roadmap (was: APIs that might be a good idea to break in 2.0)
szha commented on issue #9686: MXNet 2.0 Roadmap (was: APIs that might be a good idea to break in 2.0) URL: https://github.com/apache/incubator-mxnet/issues/9686#issuecomment-480028311 https://lists.apache.org/thread.html/a969d92e32f39e9540f3afd3d3a594efb0591083669a79e1accd02d4@%3Cdev.mxnet.apache.org%3E This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] lanking520 commented on issue #14558: How to read multi-rec files efficiently?
lanking520 commented on issue #14558: How to read multi-rec files efficiently? URL: https://github.com/apache/incubator-mxnet/issues/14558#issuecomment-480018040 @weihua04 could you please share your code in here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] anirudh2290 commented on a change in pull request #14615: Add PushAsyncPtr and PushSyncPtr APIs in engine
anirudh2290 commented on a change in pull request #14615: Add PushAsyncPtr and PushSyncPtr APIs in engine URL: https://github.com/apache/incubator-mxnet/pull/14615#discussion_r272315914 ## File path: src/engine/engine.cc ## @@ -67,4 +67,26 @@ Engine* Engine::Get() { static Engine *inst = _GetSharedRef().get(); return inst; } + +void Engine::PushAsyncPtr(AsyncFnPtr exec_fn_ptr, const std::shared_ptr& param, + Context exec_ctx, std::vector const& const_vars, + std::vector const& mutable_vars, + FnProperty prop, int priority, + const char* opr_name, bool wait) { + auto exec_fn = [exec_fn_ptr, param](RunContext rctx, Review comment: can we do a by ref capture here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] anirudh2290 commented on a change in pull request #14575: fix custom exception handling
anirudh2290 commented on a change in pull request #14575: fix custom exception handling URL: https://github.com/apache/incubator-mxnet/pull/14575#discussion_r272305858 ## File path: src/operator/custom/custom-inl.h ## @@ -96,7 +96,14 @@ class CustomOperator { bool prev_recording = Imperative::Get()->set_is_recording(recording); bool prev_training = Imperative::Get()->set_is_training(training); - func(); + try { +func(); + } catch (dmlc::Error& e) { +exception_ = +std::make_shared(std::current_exception()); +ctx.async_on_complete(); +return; + } Review comment: I think we can solve both 1 and 2 this way: After func is called do wait_to_read on all elements in arrs. Then catch and save. Remove lines 104 and 105. In PushSync, check if exception is set and rethrow exception. Also catch it and call async_on_complete in pushsync. and return. Something like the following: ``` Engine::Get()->PushSync( [=](RunContext rctx) { try { if (exception_) { std::rethrow_exception(exception_); } } catch(dmlc::Error& err) { ctx.async_on_complete(&err); return; } } ``` Thanks to this support added for horovod: https://github.com/apache/incubator-mxnet/pull/13932 we may be able to leverage this to call async_on_complete with the error. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] anirudh2290 commented on a change in pull request #14575: fix custom exception handling
anirudh2290 commented on a change in pull request #14575: fix custom exception handling URL: https://github.com/apache/incubator-mxnet/pull/14575#discussion_r272302780 ## File path: src/engine/threaded_engine.cc ## @@ -373,10 +374,12 @@ void ThreadedEngine::DeleteVariable(SyncFn delete_fn, } void ThreadedEngine::WaitForVar(VarHandle var) { + using mxnet::op::custom::CustomOperator; BulkFlush(); ThreadedVar* threaded_var = ThreadedVar::CastFromBase(var); if (threaded_var->ready_to_read()) { ThrowException(threaded_var); +CustomOperator::Get()->ThrowException(); Review comment: Lets remove this. ThreadedEngine should not depend on custom operator code. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] piyushghai commented on issue #13472: [WIP][Don't merge] Comment out dmlc::SetEnv in pthread_atfork #13438
piyushghai commented on issue #13472: [WIP][Don't merge] Comment out dmlc::SetEnv in pthread_atfork #13438 URL: https://github.com/apache/incubator-mxnet/pull/13472#issuecomment-480008047 @larroy Gentle ping. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-mxnet] branch master updated: added note about cuda9.2 requirement (#14140)
This is an automated email from the ASF dual-hosted git repository. lanking pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git The following commit(s) were added to refs/heads/master by this push: new 43f7c12 added note about cuda9.2 requirement (#14140) 43f7c12 is described below commit 43f7c12cfc5079d950200b14aec16eddd34455c5 Author: Sam Skalicky AuthorDate: Thu Apr 4 11:05:34 2019 -0700 added note about cuda9.2 requirement (#14140) --- scala-package/mxnet-demo/java-demo/README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/scala-package/mxnet-demo/java-demo/README.md b/scala-package/mxnet-demo/java-demo/README.md index cad52cb..3e742fc 100644 --- a/scala-package/mxnet-demo/java-demo/README.md +++ b/scala-package/mxnet-demo/java-demo/README.md @@ -32,6 +32,8 @@ This command will pick the default values specified in the [pom](https://github. Note: If you are planning to use GPU, please add `-Dmxnet.profile=linux-x86_64-gpu` +Note: The Maven package is built with CUDA 9.2. + ### Use customized version set You can use the following instruction as an alternative to achieve the same result: You may use `mvn package` to build the package,
[GitHub] [incubator-mxnet] lanking520 merged pull request #14140: Doc Fix: added note about cuda9.2 requirement to Java example
lanking520 merged pull request #14140: Doc Fix: added note about cuda9.2 requirement to Java example URL: https://github.com/apache/incubator-mxnet/pull/14140 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] mxnet-label-bot commented on issue #14618: Consistent crash on CI test Python3: TensorRT GPU
mxnet-label-bot commented on issue #14618: Consistent crash on CI test Python3: TensorRT GPU URL: https://github.com/apache/incubator-mxnet/issues/14618#issuecomment-480003195 Hey, this is the MXNet Label Bot. Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it. Here are my recommended labels: Test, CI This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] lanking520 opened a new issue #14618: Consistent crash on CI test Python3: TensorRT GPU
lanking520 opened a new issue #14618: Consistent crash on CI test Python3: TensorRT GPU URL: https://github.com/apache/incubator-mxnet/issues/14618 This test is failed at: ``` === Model: cifar_resnet20_v1 === *** Running inference using pure MXNet *** ``` Related CI runs: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/master/496/pipeline http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-14575/3/pipeline Error message: ``` [2019-04-04 17:21:17 ERROR] Cuda initialization failure with error 35. Please check cuda installation: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html. terminate called after throwing an instance of 'std::runtime_error' ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] piyushghai commented on issue #14140: Doc Fix: added note about cuda9.2 requirement to Java example
piyushghai commented on issue #14140: Doc Fix: added note about cuda9.2 requirement to Java example URL: https://github.com/apache/incubator-mxnet/pull/14140#issuecomment-480001337 @lanking520 @samskalicky @zachgk Can we take this PR to completion ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[incubator-mxnet] branch numpy updated (29281cd -> 921045f)
This is an automated email from the ASF dual-hosted git repository. reminisce pushed a change to branch numpy in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git. discard 29281cd [Numpy] fix test_operator_gpu.test_upsampling_bilinear_with_type (#14557) discard cc078c2 [Numpy] Misc fix (#14612) discard 5ac0b4a Fix pooling_v1 and deformable_convolution param initialization (#14577) discard 2b1599f Fix cpp package build after using new shape definition (#14554) discard c6bab57 fix R-package (#14536) discard 12018d2 fix concat and slice (#14549) discard db37dd9 [numpy] Fix numpy import in python2 (#14537) discard fcefc5a [numpy] Fix test_dynamic_shape.test_dynamic_shape (#14538) discard 38f2e06 Fix a bug to pass the test in test_contrib_rnn (#14520) discard 64c61b9 [numpy] Fix unit tests after introducing numpy compatible shapes (#14487) discard 47e2348 [WIP] Use new shape definition (#14453) discard 0731af7 [Numpy] Change semantics of ndim for operators in `src/operator/contrib` (#14409) discard b4497e7 [numpy] Shape support scalar tensor (#14315) new ce99e49 Cudnn conv dgrad algo filtering (#14310) new 4432af1 [MXNET-1226] add Docs update for MXNet Java (#14395) new ae55b75 fix Makefile (#14424) new 9fd3153 [MXNET-1291] solve pylint errors in examples with issue no.12205 (#13938) new 88b3741 Disables flaky TestStochasticTiming_2D test (#14412) new b077965 Add dtype visualization to plot_network (#14066) new 74c2274 Support multi-threading for Custom Operator (#14363) new d1fcda9 Fix entropy for uint8 (#14150) new d001eaf what's new - add 1.4.0 release (#14435) new 43173f5 moveaxis operator now accepts negative indices and sequence of ints as well. (#14321) new 226212b Add repr for SymbolBlock (#14423) new a091d36 temporarily disable integ tests with a dependency on origami repo (#14448) new f602b0d fix OOM error during resource allocation (#1) new 63ed258 Correct update count with Gluon trainer and update_on_kvstore=False (#14377) new c2f939f Update MKL-DNN to v0.18 release (was: fix the Dense layer issue) (#13668) new 020e832 Speedup _contrib_index_copy (#14359) new ab5b44c Fix crashes on visualization (#14425) new ed77d6d add contributors from intel (#14455) new d671528 begin=end not a valid input (#14403) new 3ab1dec Fix memory leak for size-zero ndarray (#14365) new c56c146 [Doc] Start the tutorials for MKL-DNN backend (#14202) new c31bb7e Enforce determinism for backwards compatibility checker (#14463) new f838f67 [MKL-DNN] Enable s8 support for inner product and 3d input with flatten=false (#14466) new 56b7b67 Fixes the test_sgld (#14473) new f98820c Revert "Fix memory leak for size-zero ndarray (#14365)" (#14477) new 4b1811c fix custom operation in fork (#14451) new 3b28e62 Change Straight Dope to Dive into Deep Learning (#14465) new 95d4680 Added link to landing page for Java examples (#14481) new a01bdee Fixes test_operator_gpu.test_multinomial_generator (#14475) new a88c562 [MXNET-949] Module API to Gluon API tutorial (#12542) new 29e13b4 Fixed tutorial warnings (#14472) new 056fce4 Add examples of running MXNet with Horovod (#14286) new a9458ca Fixes for CI downloads (#14504) new f8a0dbc Enhance PartitionGraph (#14277) new 092af36 [MXNET-1285] Draw bounding box with Scala/Java Image API (#14474) new 651a6c0 reenable the test (#14483) new c4cd49c Fix script retrieval (#14519) new 3d20f2a add filter to warnings (#14532) new 67c10f9 Adds context parameter to check_rnn_layer_forward calls in test_lstmp (#14529) new 5d2a451 Performance improving for MKL-DNN Quantized FullyConnected (#14528) new 09daf22 speedup SequenceMask on GPU (#14445) new 645c778 Tidy up storage allocation and deallocation (#14480) new 102b46f Memory fixes. Resolves #10867, and resolves #14080 (#14372) new 84c2ae1 Remove unnecessary "also" in README.md (#14543) new b20f08b [clojure]: add comp-metric based on CompositeEvalMetric (#14553) new 9f5dfbf Chouffe/clojure fix tests (#14531) new 4d04238 [clojure][image] add draw-bounding-box interop (#14533) new 5f19362 fix tests (#14565) new 8c2a25f Enhance subgraph API (#14113) new 4075212 Do not touch GPU 0 during ReleaseAll (#14550) new b6eac1d Change CUB submodule to track Nvidia CUB project. (#13322) new 09ba8be Fixes static build script for cub directory rename (#14578) new 33b6543 example/ssd/evaluate/eval_metric.py (#14561) new e2f5b47 Support SyncBatchNorm5D (#14542) new 6392666 Disable Flaky Test test_poisson_generator (#14540) new 9e4ee99 [MXNET-1357] Fix the cpp-examples to add exception handling (#14441) new dde77d4 Updates gpu tests to use CUDNN_VERSION supplied by the en
[GitHub] [incubator-mxnet] piyushghai commented on issue #12641: Add docker file which includes sample jupyter notebook (#12611)
piyushghai commented on issue #12641: Add docker file which includes sample jupyter notebook (#12611) URL: https://github.com/apache/incubator-mxnet/pull/12641#issuecomment-47992 @gautamkmr Any updates on this PR ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] piyushghai commented on issue #12440: Add stable nrm2 for L2 normalization
piyushghai commented on issue #12440: Add stable nrm2 for L2 normalization URL: https://github.com/apache/incubator-mxnet/pull/12440#issuecomment-47717 @TD Ping again. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] piyushghai commented on issue #11184: Make amalgamation part be suitable for iOS
piyushghai commented on issue #11184: Make amalgamation part be suitable for iOS URL: https://github.com/apache/incubator-mxnet/pull/11184#issuecomment-47540 @Aozorany This PR seems to be inactive for a long time. If there's any help you need to take this PR forward please feel free to reach out on this PR. I'd suggest closing this PR in the meanwhile. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] abhinavs95 commented on issue #13126: Can't break loop when using > 1 worker with DataLoader
abhinavs95 commented on issue #13126: Can't break loop when using > 1 worker with DataLoader URL: https://github.com/apache/incubator-mxnet/issues/13126#issuecomment-479995189 Hi @ThomasDelteil I think this issue is fixed in 1.4.0, I am able to run your example with no errors. Also mentioned in #14541 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] yuxihu edited a comment on issue #14615: Add PushAsyncPtr and PushSyncPtr APIs in engine
yuxihu edited a comment on issue #14615: Add PushAsyncPtr and PushSyncPtr APIs in engine URL: https://github.com/apache/incubator-mxnet/pull/14615#issuecomment-479971567 > It is better to use the pure C type in the parameters list. > Because different compilers (such as gcc, clang, msvc) may have different implementations of STL containers (std::vector, std::shared_ptr, etc.), it will cause the ABI compatible problem too. This is a good point. But we may not be able to use the pure C type here. We are pushing execution into another thread and the lifetime of the params need to be managed properly. shared_ptr is a straightforward choice. I am not aware of any ABI compatibility issue for shared_ptr as of now. Please suggest if you know any. Another option to consider is to implement a shared_ptr class by ourselves to remove the dependency on std::shared_ptr. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] piyushghai commented on issue #14617: PDF operators for the random samplers, and also the Dirichlet
piyushghai commented on issue #14617: PDF operators for the random samplers, and also the Dirichlet URL: https://github.com/apache/incubator-mxnet/pull/14617#issuecomment-479993149 Thanks for migrating your PR from v1.3.x branch to master. @mxnet-label-bot Add [pr-awaiting-review, operator] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] piyushghai commented on issue #14616: Safer norm
piyushghai commented on issue #14616: Safer norm URL: https://github.com/apache/incubator-mxnet/pull/14616#issuecomment-479992079 Thanks for your contributions @haojin2. Can you also look into the CI failures ? @mxnet-label-bot Add [Operator, Backend] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] abhinavs95 commented on issue #13313: error when DataLoader's num_workers is not 0
abhinavs95 commented on issue #13313: error when DataLoader's num_workers is not 0 URL: https://github.com/apache/incubator-mxnet/issues/13313#issuecomment-479990693 @mengjiexu looks like #14541 is a duplicate issue of this. Acc @zhreshold comment there the issue is fixed in 1.4.0 and master This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] piyushghai commented on issue #14615: Add PushAsyncPtr and PushSyncPtr APIs in engine
piyushghai commented on issue #14615: Add PushAsyncPtr and PushSyncPtr APIs in engine URL: https://github.com/apache/incubator-mxnet/pull/14615#issuecomment-479990917 Thanks for your contributions @yuxihu. @mxnet-label-bot Add [Backend] This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] piyushghai commented on issue #14613: [MXNET-978] Higher order gradient support for some unary operators
piyushghai commented on issue #14613: [MXNET-978] Higher order gradient support for some unary operators URL: https://github.com/apache/incubator-mxnet/pull/14613#issuecomment-479989084 Thanks for your contributions @apeforest. Can you also look into the CI failures ? @mxnet-label-bot Add [Operator, pr-awaiting-review, Backend]. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] abhinavs95 commented on issue #14505: RandomResizedCrop produces wrong aspect ratios.
abhinavs95 commented on issue #14505: RandomResizedCrop produces wrong aspect ratios. URL: https://github.com/apache/incubator-mxnet/issues/14505#issuecomment-479988230 @mzient fix in #14585. Thank you for your solution. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] lanking520 commented on issue #14592: Add BERT QA Scala/Java example
lanking520 commented on issue #14592: Add BERT QA Scala/Java example URL: https://github.com/apache/incubator-mxnet/pull/14592#issuecomment-479982098 @aaronmarkham @piyushghai @zachgk Could you please double check This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] arcadiaphy edited a comment on issue #14575: fix custom exception handling
arcadiaphy edited a comment on issue #14575: fix custom exception handling URL: https://github.com/apache/incubator-mxnet/pull/14575#issuecomment-479971760 @anirudh2290 I've simplified the exception catching according to your suggestion. Agreed that the situation 2 is really tricky, because for `ExecType::kAsync` op, the on_complete callback is async to op computation, and there are no mechanism to ensure on_complete not skipped when exception happens. Now `ExecType::kAsync` is only used in custom op. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] arcadiaphy commented on a change in pull request #14575: fix custom exception handling
arcadiaphy commented on a change in pull request #14575: fix custom exception handling URL: https://github.com/apache/incubator-mxnet/pull/14575#discussion_r272267663 ## File path: tests/python/unittest/test_operator.py ## @@ -5237,6 +5238,17 @@ def custom_add(): p.join(5) assert not p.is_alive(), "deadlock may exist in custom operator" +# test except handling +# see https://github.com/apache/incubator-mxnet/pull/14575 +def custom_add_exc(): Review comment: Sure, I'll add it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] arcadiaphy commented on a change in pull request #14575: fix custom exception handling
arcadiaphy commented on a change in pull request #14575: fix custom exception handling URL: https://github.com/apache/incubator-mxnet/pull/14575#discussion_r272267517 ## File path: src/operator/custom/custom-inl.h ## @@ -172,10 +186,16 @@ class CustomOperator { cv_.wait(lock, [&] {return !q_.empty() || destructing_;}); while (!q_.empty()) { --num_free_threads; -auto fn = q_.front(); +auto task = q_.front(); q_.pop(); lock.unlock(); -fn(); +try { + task.fn(); +} catch (dmlc::Error& e) { Review comment: What other types? I think the only valid exception is dmlc::Error, other exception means wrong code. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] arcadiaphy edited a comment on issue #14575: fix custom exception handling
arcadiaphy edited a comment on issue #14575: fix custom exception handling URL: https://github.com/apache/incubator-mxnet/pull/14575#issuecomment-479971760 @anirudh2290 I've simplified the exception catching according to your suggestion. Agreed that the situation 2 is really tricky, because for `ExecType::kAsync` op, the on_complete callback is async to op computation, and right now there are no mechanism to ensure on_complete not skipped when exception happens. Now `ExecType::kAsync` is only used in custom op. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] arcadiaphy commented on issue #14575: fix custom exception handling
arcadiaphy commented on issue #14575: fix custom exception handling URL: https://github.com/apache/incubator-mxnet/pull/14575#issuecomment-479971760 @anirudh2290 I've simplified the exception catching according to your suggestion. Agreed that the situation 2 is really tricky, because for `ExecType::kAsync` op, the on_complete callback is async to op computation, and right now there are no mechanism to ensure on_complete not skipped when exception happens. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services