Re: [theano-users] Numpy error during optimization phase
Sorry for the delay, I just re-ran it in a clean conda environnement, here are my system specs: OS: archlinux nvidia: 390.25 cuda: 9.1.85 numpy: 1.14.0 pygpu: 0.7.5 theano: git master .theanorc: [global] device = cuda floatX = float32 warn_float64 = warn on_opt_error = raise [nvcc] fastmath = True [gpuarray] preallocate = 0.85 [cuda] include_path = /opt/cuda/include library_path = /opt/cuda/lib64 Le mercredi 7 février 2018 21:32:01 UTC+1, nouiz a écrit : > > I'm not able to reproduce it. > > On which OS? Which Theano version? Can you try a Theano version at least > 1.0.1? > > You can ignore this "error". Mostly, some optimization are skipped. But I > would still like to fix it. > > I ran the tests like this: > > THEANO_FLAGS=device=cuda,floatX=float32 nosetests test_ctc.py &> OUT > > What are your Theano flags? > > On Wed, Jan 24, 2018 at 5:05 AM > > wrote: > >> Hi everyone, >> >> While using an OpFromGraph involving some operations with binary values, >> there is an optimization error: >> >> theano.gof.opt: ERROR: Optimization failure due to: local_add_canonizer >>> theano.gof.opt: ERROR: node: >>> Elemwise{add,no_inplace}(InplaceDimShuffle{0,1,x}.0, >>> InplaceDimShuffle{x,0,1}.0) >>> theano.gof.opt: ERROR: TRACEBACK: >>> theano.gof.opt: ERROR: Traceback (most recent call last): >>> File "/home/granger/dev/Theano/theano/gof/opt.py", line 2034, in >>> process_node >>> replacements = lopt.transform(node) >>> File "/home/granger/dev/Theano/theano/tensor/opt.py", line 4989, in >>> transform >>> num, denum = self.simplify(list(orig_num), list(orig_denum), out.type) >>> File "/home/granger/dev/Theano/theano/tensor/opt.py", line 4833, in >>> simplify >>> out_type=out_type) >>> File "/home/granger/dev/Theano/theano/tensor/opt.py", line 4919, in >>> simplify_constants >>> out_type=out_type) >>> File "/home/granger/dev/Theano/theano/tensor/opt.py", line 6328, in >>> add_calculate >>> v = reduce(np.add, num, zero) - reduce(np.add, denum, zero) >>> TypeError: numpy boolean subtract, the `-` operator, is deprecated, use >>> the bitwise_xor, the `^` operator, or the logical_xor function instead. >> >> >> This error does not happen when running on CPU backend. >> I suspect it might be due to the use of binary values in my code, but the >> log message is not very helpful, is there any way to get some more >> information to track down the error? Note that the fast_compile optimizer >> does not trigger the error, only the fast_run one. >> >> A demo code and the complete output is available here: >> https://gist.github.com/nlgranger/279bda7fff356cfe3f40ad6397d0ba04 >> >> Best, >> Nicolas >> >> -- >> >> --- >> You received this message because you are subscribed to the Google Groups >> "theano-users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to theano-users...@googlegroups.com . >> For more options, visit https://groups.google.com/d/optout. >> > -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [theano-users] Numpy error during optimization phase
I'm using numpy 1.14.0 from the popular conda-forge repository. BTW, I was wrong with the GPU/CPU distinction: the errror is triggered in either case. -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[theano-users] Numpy error during optimization phase
Hi everyone, While using an OpFromGraph involving some operations with binary values, there is an optimization error: theano.gof.opt: ERROR: Optimization failure due to: local_add_canonizer > theano.gof.opt: ERROR: node: > Elemwise{add,no_inplace}(InplaceDimShuffle{0,1,x}.0, > InplaceDimShuffle{x,0,1}.0) > theano.gof.opt: ERROR: TRACEBACK: > theano.gof.opt: ERROR: Traceback (most recent call last): > File "/home/granger/dev/Theano/theano/gof/opt.py", line 2034, in > process_node > replacements = lopt.transform(node) > File "/home/granger/dev/Theano/theano/tensor/opt.py", line 4989, in > transform > num, denum = self.simplify(list(orig_num), list(orig_denum), out.type) > File "/home/granger/dev/Theano/theano/tensor/opt.py", line 4833, in > simplify > out_type=out_type) > File "/home/granger/dev/Theano/theano/tensor/opt.py", line 4919, in > simplify_constants > out_type=out_type) > File "/home/granger/dev/Theano/theano/tensor/opt.py", line 6328, in > add_calculate > v = reduce(np.add, num, zero) - reduce(np.add, denum, zero) > TypeError: numpy boolean subtract, the `-` operator, is deprecated, use > the bitwise_xor, the `^` operator, or the logical_xor function instead. This error does not happen when running on CPU backend. I suspect it might be due to the use of binary values in my code, but the log message is not very helpful, is there any way to get some more information to track down the error? Note that the fast_compile optimizer does not trigger the error, only the fast_run one. A demo code and the complete output is available here: https://gist.github.com/nlgranger/279bda7fff356cfe3f40ad6397d0ba04 Best, Nicolas -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [theano-users] Split Op (OpFromGraph) to save intermediate results for grad
"forward the precomputed output" means that Op1 already computed the final output, therefore Op2 just has to behaves as identity in the forward pass The intermediate value is already an output of Op1 as shown in the example code, sorry if that wasn't clear. Nicolas Le mardi 8 août 2017 20:56:12 UTC+2, nouiz a écrit : > > I don't understand what you mean by "forward the precomputed output" > > What I would recommand is to make 1 op for the forward. The intermediate > value that can be reused for the gradient, make them output. Don't use them > in the forward, but you can reuse them your grad override. > > Frédéric > > On Mon, Jul 31, 2017 at 9:43 AM > > wrote: > >> I am trying to build an Op with a custom/optimized gradient formula. To >> override the automatic differenciation, I'm trying to use OpFromGraph. >> The gradient formula can reuse intermediate results from the feed forward >> pass, so I have tried to split the Op in two: Op1 computes the intermediate >> and final result and gives all of it to Op2, Op2 forwards the final result >> and takes care of the gradient computation given all the necessary values. >> >> Note that the gradient of the loss wrt the intermediate results is never >> needed. >> >> Below is a what I believe to be a minimal working example of my problem, >> it exhibits a strange conversion error related to the gradient computation >> with the intermediate values. Please take note of the presence of an >> integral variable. >> >> import numpy as np >> import theano.tensor as T >> import theano >> >> >> def make_ops(): >> x = T.vector() >> m = T.bvector() >> >> r = m.sum().astype('floatX') # intermediate value >> z = x * m / r # final result >> >> >> def grad_op1(inputs, output_gradients): >> return [ >> output_gradients[0], # gradient computation delegated to op2 >> T.DisconnectedType()() # variable has integral type >> # T.zeros_like(inputs[1]) >> ] >> >> >> op1 = theano.OpFromGraph( >> inputs=[x, m], >> outputs=[z, m, r], >> grad_overrides=grad_op1, >> inline=True, >> name="op1") >> >> >> z = T.vector() >> r_forwarded = T.scalar() >> >> def grad_op2(inputs, output_gradients): >> _, m_, r_ = inputs >> dm_ = theano.gradient.DisconnectedType()(name="dm_") >> # I think the error could be around here >> <<-- >> # dr_ = theano.gradient.DisconnectedType()(name="dr_") >> dr_ = T.zeros_like(r_) >> return [m_ / r_, dm_, dr_] >> >> op2 = theano.OpFromGraph( >> inputs=[z, m, r_forwarded], >> outputs=[z], # Op 2 forwards the precomputed output >> grad_overrides=grad_op2, >> inline=True, >> name="op2") >> >> return op1, op2 >> >> >> def main(): >> op1, op2 = make_ops() >> x = T.vector(name="x") >> m = T.bvector(name="m") >> z_intermediate, m_forwarded, r = op1(x, m) >> z = op2(z_intermediate, m, r) >> >> g = theano.grad(T.sum(z), wrt=x) >> print(g.eval({x: np.array([1., .3, .0, .2], dtype=np.float32), >> m: np.array([1, 0, 1, 1], dtype=np.int8)})) >> >> >> if __name__ == "__main__": >> main() >> >> (Note: I had tried to hijack my previous question thread with this >> problem but it went unnoticed, sorry for double posting) >> >> Thank you >> >> -- >> >> --- >> You received this message because you are subscribed to the Google Groups >> "theano-users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to theano-users...@googlegroups.com . >> For more options, visit https://groups.google.com/d/optout. >> > -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[theano-users] Split Op (OpFromGraph) to save intermediate results for grad
I am trying to build an Op with a custom/optimized gradient formula. To override the automatic differenciation, I'm trying to use OpFromGraph. The gradient formula can reuse intermediate results from the feed forward pass, so I have tried to split the Op in two: Op1 computes the intermediate and final result and gives all of it to Op2, Op2 forwards the final result and takes care of the gradient computation given all the necessary values. Note that the gradient of the loss wrt the intermediate results is never needed. Below is a what I believe to be a minimal working example of my problem, it exhibits a strange conversion error related to the gradient computation with the intermediate values. Please take note of the presence of an integral variable. import numpy as np import theano.tensor as T import theano def make_ops(): x = T.vector() m = T.bvector() r = m.sum().astype('floatX') # intermediate value z = x * m / r # final result def grad_op1(inputs, output_gradients): return [ output_gradients[0], # gradient computation delegated to op2 T.DisconnectedType()() # variable has integral type # T.zeros_like(inputs[1]) ] op1 = theano.OpFromGraph( inputs=[x, m], outputs=[z, m, r], grad_overrides=grad_op1, inline=True, name="op1") z = T.vector() r_forwarded = T.scalar() def grad_op2(inputs, output_gradients): _, m_, r_ = inputs dm_ = theano.gradient.DisconnectedType()(name="dm_") # I think the error could be around here <<-- # dr_ = theano.gradient.DisconnectedType()(name="dr_") dr_ = T.zeros_like(r_) return [m_ / r_, dm_, dr_] op2 = theano.OpFromGraph( inputs=[z, m, r_forwarded], outputs=[z], # Op 2 forwards the precomputed output grad_overrides=grad_op2, inline=True, name="op2") return op1, op2 def main(): op1, op2 = make_ops() x = T.vector(name="x") m = T.bvector(name="m") z_intermediate, m_forwarded, r = op1(x, m) z = op2(z_intermediate, m, r) g = theano.grad(T.sum(z), wrt=x) print(g.eval({x: np.array([1., .3, .0, .2], dtype=np.float32), m: np.array([1, 0, 1, 1], dtype=np.int8)})) if __name__ == "__main__": main() (Note: I had tried to hijack my previous question thread with this problem but it went unnoticed, sorry for double posting) Thank you -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[theano-users] Re: Unused input error with chained OpFromGraph ops
Hello, I still haven't managed to trace the error down. Below is a shorter example that triggers the error. It seems theano tries to create a variable for the output gradient for a node through which I do not back propagate. At some point it hits a DisconnectedType instance and raises an error. import numpy as np import theano.tensor as T import theano def make_ops(): x_var = T.vector() m_var = T.bvector() r = m_var.sum().astype('floatX') z = x_var * m_var / r def grad_op1(inputs, output_gradients): pass return [ output_gradients[0], # computation delegated to op2 theano.gradient.DisconnectedType()() ] op1 = theano.OpFromGraph( inputs=[x_var, m_var], outputs=[z, r], grad_overrides=grad_op1, inline=True, name="op1") return op1 op1 = make_ops() x_var = T.vector() m_var = T.bvector() z, r = op1(x_var, m_var) g = theano.grad(T.sum(z), wrt=x_var) print(g.eval({x_var: np.array([1., .3, .0, .2], dtype=np.float32), m_var: np.array([1, 0, 1, 1], dtype=np.int8)})) output: TypeError: Cannot convert Type DisconnectedType (of Variable < DisconnectedType>) into Type TensorType(float32, scalar). You can try to manually convert into a TensorType(float32, scalar). Process finished with exit code 1 Le jeudi 13 juillet 2017 13:03:28 UTC+2, nicolas@gmail.com a écrit : > > Hi, > > Thank you for the suggestion, actually inlining makes more sense for what > I am trying to do. > > However, a casting issue arises when trying to compute the derivative wrt > to the continuous input. If I understood correctly, DisconnectedInput > should be returned as the gradient for integral inputs (or inputs wrt which > I don't need the derivative) right? > > Below is the slightly modified code which illustrate this new issue: > > import numpy as np > import theano.tensor as T > import theano > > > def make_ops(): > x_var = T.vector() > m_var = T.bvector() > > r = m_var.sum().astype('floatX') > z = x_var * m_var / r > > > def grad_op1(inputs, output_gradients): > return [ > output_gradients[0], # computation delegated to op2 > theano.gradient.DisconnectedType()(), > ] > > > op1 = theano.OpFromGraph( > inputs=[x_var, m_var], > outputs=[z, r], > grad_overrides=grad_op1, > inline=True) > > > z_var = T.vector() > r_var = T.scalar() > > def grad_op2(inputs, output_gradients): > _, m_, r_ = inputs > return [ > m_ * r_, > theano.gradient.DisconnectedType()(), > theano.gradient.DisconnectedType()() > ] > > op2 = theano.OpFromGraph( > inputs=[z_var, m_var, r_var], > outputs=[z_var], > grad_overrides=grad_op2, > inline=True) > > return op1, op2 > > > op1, op2 = make_ops() > x_var = T.vector() > m_var = T.bvector() > z_, r = op1(x_var, m_var) > z = op2(z_, m_var, r) > > g = theano.grad(T.sum(z), wrt=x_var) > print(g.eval({x_var: np.array([1., .3, .0, .2], dtype=np.float32), > m_var: np.array([1, 0, 1, 1], dtype=np.int8)})) > > > > Le mardi 11 juillet 2017 11:32:50 UTC+2, nicolas@gmail.com a écrit : >> >> Hi, >> >> I am trying to split an computation over two ops in order to avoid >> spurious computations when computing the gradient. >> My current attempt uses a first op which returns the desired result for >> the forward part and extra intermediate results. The second op just >> forwards the desired result, but its grad is overriden to compute the >> gradient based on the intermediate results. >> >> In this configuration, Theano complains about unused inputs in the >> forward computation because the intermediate results are not used for the >> forward method of the second op. >> >> Is this an expected behaviour or a bug? >> >> >> >> import numpy as np >> import theano.tensor as T >> import theano >> >> >> def make_ops(): >> x_var = T.vector() >> m_var = T.bvector() >> >> r = m_var.sum().astype('floatX') >> z = x_var * m_var / r >> >> >> def grad_op1(inputs, output_gradients): >> return [ >> output_gradients[0], # computation delegated to op2 >> theano.gradient.DisconnectedType()() >> ] >> >> >> op1 = theano.OpFromGraph( >> inputs=[x_var, m_var], >> outputs=[z, r], >> grad_overrides=grad_op1) >> >> >> z_var = T.vector() >> r_var = T.scalar() >> >> def grad_op2(inputs, output_gradients): >> _, m_, r_ = inputs >> return [ >> m_ * r_, >> theano.gradient.DisconnectedType()(), >> theano.gradient.DisconnectedType()() >> ] >> >> op2 = theano.OpFromGraph( >> inputs=[z_var, m_var, r_var], >> outputs=[z_var], >> grad_overrides=grad_op2) >> >> return op1, op2 >> >> >> op1, op2 = make_ops
[theano-users] Re: Unused input error with chained OpFromGraph ops
Hi, Thank you for the suggestion, actually inlining makes more sense for what I am trying to do. However, a casting issue arises when trying to compute the derivative wrt to the continuous input. If I understood correctly, DisconnectedInput should be returned as the gradient for integral inputs (or inputs wrt which I don't need the derivative) right? Below is the slightly modified code which illustrate this new issue: import numpy as np import theano.tensor as T import theano def make_ops(): x_var = T.vector() m_var = T.bvector() r = m_var.sum().astype('floatX') z = x_var * m_var / r def grad_op1(inputs, output_gradients): return [ output_gradients[0], # computation delegated to op2 theano.gradient.DisconnectedType()(), ] op1 = theano.OpFromGraph( inputs=[x_var, m_var], outputs=[z, r], grad_overrides=grad_op1, inline=True) z_var = T.vector() r_var = T.scalar() def grad_op2(inputs, output_gradients): _, m_, r_ = inputs return [ m_ * r_, theano.gradient.DisconnectedType()(), theano.gradient.DisconnectedType()() ] op2 = theano.OpFromGraph( inputs=[z_var, m_var, r_var], outputs=[z_var], grad_overrides=grad_op2, inline=True) return op1, op2 op1, op2 = make_ops() x_var = T.vector() m_var = T.bvector() z_, r = op1(x_var, m_var) z = op2(z_, m_var, r) g = theano.grad(T.sum(z), wrt=x_var) print(g.eval({x_var: np.array([1., .3, .0, .2], dtype=np.float32), m_var: np.array([1, 0, 1, 1], dtype=np.int8)})) Le mardi 11 juillet 2017 11:32:50 UTC+2, nicolas@gmail.com a écrit : > > Hi, > > I am trying to split an computation over two ops in order to avoid > spurious computations when computing the gradient. > My current attempt uses a first op which returns the desired result for > the forward part and extra intermediate results. The second op just > forwards the desired result, but its grad is overriden to compute the > gradient based on the intermediate results. > > In this configuration, Theano complains about unused inputs in the forward > computation because the intermediate results are not used for the forward > method of the second op. > > Is this an expected behaviour or a bug? > > > > import numpy as np > import theano.tensor as T > import theano > > > def make_ops(): > x_var = T.vector() > m_var = T.bvector() > > r = m_var.sum().astype('floatX') > z = x_var * m_var / r > > > def grad_op1(inputs, output_gradients): > return [ > output_gradients[0], # computation delegated to op2 > theano.gradient.DisconnectedType()() > ] > > > op1 = theano.OpFromGraph( > inputs=[x_var, m_var], > outputs=[z, r], > grad_overrides=grad_op1) > > > z_var = T.vector() > r_var = T.scalar() > > def grad_op2(inputs, output_gradients): > _, m_, r_ = inputs > return [ > m_ * r_, > theano.gradient.DisconnectedType()(), > theano.gradient.DisconnectedType()() > ] > > op2 = theano.OpFromGraph( > inputs=[z_var, m_var, r_var], > outputs=[z_var], > grad_overrides=grad_op2) > > return op1, op2 > > > op1, op2 = make_ops() > x_var = T.vector() > m_var = T.bvector() > z_, r = op1(x_var, m_var) > z = op2(z_, m_var, r) > > print(z_.eval({x_var: np.array([1., .3, .0, .2], dtype=np.float32), >m_var: np.array([1, 0, 1, 1], dtype=np.int8)})) > > f = theano.function([x_var, m_var], [z], on_unused_input='ignore') # > raises anyway > > print(f(np.array([1., .3, .0, .2], dtype=np.float32), > np.array([1, 0, 1, 1], dtype=np.int8))) > > # g = theano.grad(T.sum(z), wrt=x_var) > # print(g.eval({x_var: np.array([1., .3, .0, .2], dtype=np.float32), > # m_var: np.array([1, 0, 1, 1], dtype=np.int8)})) > > > -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[theano-users] Unused input error with chained OpFromGraph ops
Hi, I am trying to split an computation over two ops in order to avoid spurious computations when computing the gradient. My current attempt uses a first op which returns the desired result for the forward part and extra intermediate results. The second op just forwards the desired result, but its grad is overriden to compute the gradient based on the intermediate results. In this configuration, Theano complains about unused inputs in the forward computation because the intermediate results are not used for the forward method of the second op. Is this an expected behaviour or a bug? import numpy as np import theano.tensor as T import theano def make_ops(): x_var = T.vector() m_var = T.bvector() r = m_var.sum().astype('floatX') z = x_var * m_var / r def grad_op1(inputs, output_gradients): return [ output_gradients[0], # computation delegated to op2 theano.gradient.DisconnectedType()() ] op1 = theano.OpFromGraph( inputs=[x_var, m_var], outputs=[z, r], grad_overrides=grad_op1) z_var = T.vector() r_var = T.scalar() def grad_op2(inputs, output_gradients): _, m_, r_ = inputs return [ m_ * r_, theano.gradient.DisconnectedType()(), theano.gradient.DisconnectedType()() ] op2 = theano.OpFromGraph( inputs=[z_var, m_var, r_var], outputs=[z_var], grad_overrides=grad_op2) return op1, op2 op1, op2 = make_ops() x_var = T.vector() m_var = T.bvector() z_, r = op1(x_var, m_var) z = op2(z_, m_var, r) print(z_.eval({x_var: np.array([1., .3, .0, .2], dtype=np.float32), m_var: np.array([1, 0, 1, 1], dtype=np.int8)})) f = theano.function([x_var, m_var], [z], on_unused_input='ignore') # raises anyway print(f(np.array([1., .3, .0, .2], dtype=np.float32), np.array([1, 0, 1, 1], dtype=np.int8))) # g = theano.grad(T.sum(z), wrt=x_var) # print(g.eval({x_var: np.array([1., .3, .0, .2], dtype=np.float32), # m_var: np.array([1, 0, 1, 1], dtype=np.int8)})) -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[theano-users] Propagate the gradient only for a subset of samples in the bottom layers
Hello, I am trying out an architecture for video sequences where a small CNN extracts features from each frame and then feeds them to an LSTM. |RNN|->|RNN|->|RNN|->|RNN|->|RNN|->|RNN| | | | | | | |CNN| |CNN| |CNN| |CNN| |CNN| |CNN| If possible I would like to train the CNN and the LSTM jointly on full sequences (2000 frames). The forward pass gradient data for the CNN is too big to be stored so I would like to know if one can split the training in two parts: 1. feed forward the samples through the CNN but only store backpropagation data for a subset of the frames 2. propagate through the LSTM as usual 3. backpropagate down through the LSTM and update parameters as usual 4. back propagate down through the CNN for the samples that belong to the subset and update the parameters theano.gradient.grad has a known_grads argument, maybe that could help? Regards, Nicolas -- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to theano-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.