Re: [ViennaCL-devel] Benchmark GUI warmup

2014-05-06 Thread Karl Rupp
Hi,

  Why is data pointless? I'd rather have only a few datapoints on new
 hardware out there rather than having absolutely no data at all.


 I mean, the data is pretty useful because it tells us about the best
 default kernel for large square matrices, but it is not very useful if
 we want to build a general input-dependent model, as it requires in my
 experience more than 1000 data points.

This is true. So this calls for an hierarchical approach:
  Level 1: Just a couple of known kernels for a given data size, which 
are compared on the target machine.
  Level 2: A full tuning set for one data size on the target
  Level 3: All ~1000 points for building a model

Execution times between these levels vary significantly: While almost 
all users will go through Level 1 anyway, only a few will have the 
patience to wait for results on Level 2. Level 3 will be mostly for us 
to have a 'normalized' process for building performance models. Either 
way, if others will join (machine learning community?), that would be great!


 I'd rather refrain from running Python scripts from the benchmark
 GUI. This is intended to be an end-user tool. Those interested in
 running from Python should take the Python code (i.e. PyViennaCL)
 directly.


 Are you sure? It would not take a lot of efforts to have an optional way
 to call the python script with the proper arguments from the auto-tuner,
 as long as the user provides the path and that he has all the necessary
 dependencies.

The second half of the last sentence is the problem. I expect 80% of 
users to run on Windows, where anything but a 'double click installer' 
is a non-standard process. If Namik has time left by the end of the 
summer, we can look into that, but we first need to focus on our target 
audience.


 Such cases are probably only interesting for the 'expert settings'
 tab in the GUI, as these parameters only make sense to people who
 *really* know what they are doing (and willing to invest the time).
 For bloggers, journalists, etc., who just want to quickly get some
 performance datapoints for the very latest hardware, this is usually
 not of interest. We need to foc

 us on serving the main audience first and then watch out for
 fruitful directions on how to extend it further.


 Of course ! I've been referring to the expert settings tab from the
 beginning :)

Ah, please say so :-)

Best regards,
Karli


--
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
#149; 3 signs your SCM is hindering your productivity
#149; Requirements for releasing software faster
#149; Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel


Re: [ViennaCL-devel] Benchmark GUI warmup

2014-05-06 Thread Namik Karovic
Hello,

Apologies for not replying earlier, I've been quite busy these last two
days.

So far I have been exploring the advantages/disadvantages of using
QML/QtQuick vs traditional widget based GUI. QML has some great design
features that could improve the overall user experience and aren't easily
implemented when using widgets. I was originally planning to develop some
parts using QML ( animations and charts ) and integrate them with the main
widget based GUI. However I am now exploring the possibility of doing the
entire GUI in QML. Suggestions on which approach to choose are welcome.

Reading through your discussion about expert benchmark setting I see that I
probably should have spent more time studying the autotuner and benchmark
codes :/ I understand that there is a great need for expert benchmark
customization and I hope to succeed in making that part as detailed as
possible, but there should a certain limit to the extent of details. What
I'm saying is I'd rather not spend time developing features that will be
used only a couple of times. Surely there are some details that aren't of
critical importance?

It would be great if you guys could agree on what expert details are of
greatest priority. I'm going to start studying the autotuner and benchmark
codes so I can better understand what needs to be done.

Best regards,
Namik


On Tue, May 6, 2014 at 9:38 AM, Karl Rupp r...@iue.tuwien.ac.at wrote:

 Hi,


  Why is data pointless? I'd rather have only a few datapoints on new

 hardware out there rather than having absolutely no data at all.


 I mean, the data is pretty useful because it tells us about the best
 default kernel for large square matrices, but it is not very useful if
 we want to build a general input-dependent model, as it requires in my
 experience more than 1000 data points.


 This is true. So this calls for an hierarchical approach:
  Level 1: Just a couple of known kernels for a given data size, which are
 compared on the target machine.
  Level 2: A full tuning set for one data size on the target
  Level 3: All ~1000 points for building a model

 Execution times between these levels vary significantly: While almost all
 users will go through Level 1 anyway, only a few will have the patience to
 wait for results on Level 2. Level 3 will be mostly for us to have a
 'normalized' process for building performance models. Either way, if others
 will join (machine learning community?), that would be great!



  I'd rather refrain from running Python scripts from the benchmark
 GUI. This is intended to be an end-user tool. Those interested in
 running from Python should take the Python code (i.e. PyViennaCL)
 directly.


 Are you sure? It would not take a lot of efforts to have an optional way
 to call the python script with the proper arguments from the auto-tuner,
 as long as the user provides the path and that he has all the necessary
 dependencies.


 The second half of the last sentence is the problem. I expect 80% of
 users to run on Windows, where anything but a 'double click installer' is a
 non-standard process. If Namik has time left by the end of the summer, we
 can look into that, but we first need to focus on our target audience.



  Such cases are probably only interesting for the 'expert settings'
 tab in the GUI, as these parameters only make sense to people who
 *really* know what they are doing (and willing to invest the time).
 For bloggers, journalists, etc., who just want to quickly get some
 performance datapoints for the very latest hardware, this is usually
 not of interest. We need to foc

 us on serving the main audience first and then watch out for
 fruitful directions on how to extend it further.


 Of course ! I've been referring to the expert settings tab from the
 beginning :)


 Ah, please say so :-)

 Best regards,
 Karli


--
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
#149; 3 signs your SCM is hindering your productivity
#149; Requirements for releasing software faster
#149; Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel


Re: [ViennaCL-devel] Benchmark GUI warmup

2014-05-06 Thread Philippe Tillet
Hey Namik,


2014-05-06 19:43 GMT+02:00 Namik Karovic namik.karo...@gmail.com:

 Hello,

 Apologies for not replying earlier, I've been quite busy these last two
 days.


Don't worry ;)

So far I have been exploring the advantages/disadvantages of using
 QML/QtQuick vs traditional widget based GUI. QML has some great design
 features that could improve the overall user experience and aren't easily
 implemented when using widgets. I was originally planning to develop some
 parts using QML ( animations and charts ) and integrate them with the main
 widget based GUI. However I am now exploring the possibility of doing the
 entire GUI in QML. Suggestions on which approach to choose are welcome.


Unfortunately, I don't know much about Qt, so I probably couldn't help
here. However, keep in mind that we aim for maximum portability. I would
tend to think that QML is more portable accross languages, and so I would
say go for it, as long as you don't loose portability elsewhere.

Reading through your discussion about expert benchmark setting I see that I
 probably should have spent more time studying the autotuner and benchmark
 codes :/ I understand that there is a great need for expert benchmark
 customization and I hope to succeed in making that part as detailed as
 possible, but there should a certain limit to the extent of details. What
 I'm saying is I'd rather not spend time developing features that will be
 used only a couple of times. Surely there are some details that aren't of
 critical importance?


 It would be great if you guys could agree on what expert details are of
 greatest priority. I'm going to start studying the autotuner and benchmark
 codes so I can better understand what needs to be done.


I think that the most important part of the project is the
intuitiveness/functionnality of the GUI. Keep in mind that most of your
userbase will have a limited amount of time, and that anything beyond
double-click+coffee break will probably be ignored ;)

I really believe that there is no need to read any code related to the
auto-tuner, as it is disappearing. Re-implementing an exhaustive search for
one particular size for the GUI will not be a huge challenge, so don't
worry too much about it.
This thread is exclusively dedicated to possible features in the expert
tab, which is not a priority for now (but it's still good to have some
mid-term perspective when starting a project).

That being said, I believe that the Basic options should include:
- Benchmarking of as many routines as possible : BLAS, FFT, Solvers, etc...
- Simple exhaustive-search auto-tuning for what supports it : What could
this hardware ideally give on this problem
- Export of the benchmark results to an open database

I don't think you should worry about anything else as of now. I'll be
working rather actively on a command-line interface to some advanced
auto-tuning features.

Philippe



 Best regards,
 Namik


 On Tue, May 6, 2014 at 9:38 AM, Karl Rupp r...@iue.tuwien.ac.at wrote:

 Hi,


  Why is data pointless? I'd rather have only a few datapoints on new

 hardware out there rather than having absolutely no data at all.


 I mean, the data is pretty useful because it tells us about the best
 default kernel for large square matrices, but it is not very useful if
 we want to build a general input-dependent model, as it requires in my
 experience more than 1000 data points.


 This is true. So this calls for an hierarchical approach:
  Level 1: Just a couple of known kernels for a given data size, which are
 compared on the target machine.
  Level 2: A full tuning set for one data size on the target
  Level 3: All ~1000 points for building a model

 Execution times between these levels vary significantly: While almost all
 users will go through Level 1 anyway, only a few will have the patience to
 wait for results on Level 2. Level 3 will be mostly for us to have a
 'normalized' process for building performance models. Either way, if others
 will join (machine learning community?), that would be great!



  I'd rather refrain from running Python scripts from the benchmark
 GUI. This is intended to be an end-user tool. Those interested in
 running from Python should take the Python code (i.e. PyViennaCL)
 directly.


 Are you sure? It would not take a lot of efforts to have an optional way
 to call the python script with the proper arguments from the auto-tuner,
 as long as the user provides the path and that he has all the necessary
 dependencies.


 The second half of the last sentence is the problem. I expect 80% of
 users to run on Windows, where anything but a 'double click installer' is a
 non-standard process. If Namik has time left by the end of the summer, we
 can look into that, but we first need to focus on our target audience.



  Such cases are probably only interesting for the 'expert settings'
 tab in the GUI, as these parameters only make sense to people who
 *really* know what 

Re: [ViennaCL-devel] Benchmark GUI warmup

2014-05-06 Thread Philippe Tillet
Hi,


2014-05-06 9:38 GMT+02:00 Karl Rupp r...@iue.tuwien.ac.at:

 Hi,


  Why is data pointless? I'd rather have only a few datapoints on new

 hardware out there rather than having absolutely no data at all.


 I mean, the data is pretty useful because it tells us about the best
 default kernel for large square matrices, but it is not very useful if
 we want to build a general input-dependent model, as it requires in my
 experience more than 1000 data points.


 This is true. So this calls for an hierarchical approach:
  Level 1: Just a couple of known kernels for a given data size, which are
 compared on the target machine.
  Level 2: A full tuning set for one data size on the target
  Level 3: All ~1000 points for building a model

 Execution times between these levels vary significantly: While almost all
 users will go through Level 1 anyway, only a few will have the patience to
 wait for results on Level 2. Level 3 will be mostly for us to have a
 'normalized' process for building performance models. Either way, if others
 will join (machine learning community?), that would be great!



  I'd rather refrain from running Python scripts from the benchmark
 GUI. This is intended to be an end-user tool. Those interested in
 running from Python should take the Python code (i.e. PyViennaCL)
 directly.


 Are you sure? It would not take a lot of efforts to have an optional way
 to call the python script with the proper arguments from the auto-tuner,
 as long as the user provides the path and that he has all the necessary
 dependencies.


 The second half of the last sentence is the problem. I expect 80% of
 users to run on Windows, where anything but a 'double click installer' is a
 non-standard process. If Namik has time left by the end of the summer, we
 can look into that, but we first need to focus on our target audience.


I think you're right. Namik's GUI should only provide Level 1 and 2, which
do not require any Python. Since Level 3 would be an internal tool as you
correctly pointed it out, we could stick to a python command-line
interface, or a rudimentary PyQt GUI.

Philippe




  Such cases are probably only interesting for the 'expert settings'
 tab in the GUI, as these parameters only make sense to people who
 *really* know what they are doing (and willing to invest the time).
 For bloggers, journalists, etc., who just want to quickly get some
 performance datapoints for the very latest hardware, this is usually
 not of interest. We need to foc

 us on serving the main audience first and then watch out for
 fruitful directions on how to extend it further.


 Of course ! I've been referring to the expert settings tab from the
 beginning :)


 Ah, please say so :-)

 Best regards,
 Karli


--
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
#149; 3 signs your SCM is hindering your productivity
#149; Requirements for releasing software faster
#149; Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel


Re: [ViennaCL-devel] Benchmark GUI warmup

2014-05-05 Thread Karl Rupp
Hi,

(CC-ing viennacl-devel, as this is developer-talk ;-) )

 Either way, I want to let you know that the generator/auto-tuner is
 undergoing significant changes, and that you will, actually, not have to
 worry about it for your GSoC project. The generator will be used
 transparently via the viennacl::linalg:: functions, and the auto-tuner
 will be entirely moved to pyviennacl.

Well, I think this is not entirely unrelated. The purpose of the GUI is 
still to allow a broader community to feed us with benchmark data, so 
somehow the loop over all possible configurations is still essential. 
With an interface to Python I assume that an API to do exactly that will 
still be available ;-)



 There is, however, one additional point I'd like to discuss. The
 performance of all the algorithms you'll have to benchmark are highly
 dependent on the characteristics of the input data. For example, matrix
 products will behave very differently according to the size/shape of the
 input matrices. This is very important : this means that a good
 benchmarking GUI could help the users to design their system.
 Here's an example. Suppose that someone wants to solve the linear system:
 A*x* = *y*
 If, for his particular application, A is a 50,000x50,000 sparse matrix,
 then he could be greatly interested in knowing how he could pad A to
 achieve better performance. In that case, the benchmarking-gui could
 explore randomly R^2 beyond (50,000 ; 50,000), and potentially tell the
 user that, if he makes A a (50,500; 50,500) matrix, then he could
 improve his performance by say 10 or 20%.

For sparse matrices I don't believe in random patterns. The user usually 
has a particular application in mind, so I consider it more important to
  a) Allow users to feed the tuner with their own sparse matrix
  b) Allow users to select sparse matrices from the Florida matrix market
The second option is important for benchmark purposes and for comparison 
with data in the literature. We can also add a third option for random 
matrices, but it's certainly far less important.


 In the case of dense matrix
 products, one may even be able to double his performance by slightly
 altering the size of the input matrices.

Okay, this is only about adjusting the padding parameter and should 
transparently included in the tuning process anyway, shouldn't it?

Best regards,
Karli


--
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
#149; 3 signs your SCM is hindering your productivity
#149; Requirements for releasing software faster
#149; Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel


Re: [ViennaCL-devel] Benchmark GUI warmup

2014-05-05 Thread Philippe Tillet
Hi,


2014-05-05 9:18 GMT+02:00 Karl Rupp r...@iue.tuwien.ac.at:

 Hi,

 (CC-ing viennacl-devel, as this is developer-talk ;-) )


  Either way, I want to let you know that the generator/auto-tuner is
 undergoing significant changes, and that you will, actually, not have to
 worry about it for your GSoC project. The generator will be used
 transparently via the viennacl::linalg:: functions, and the auto-tuner
 will be entirely moved to pyviennacl.


 Well, I think this is not entirely unrelated. The purpose of the GUI is
 still to allow a broader community to feed us with benchmark data, so
 somehow the loop over all possible configurations is still essential. With
 an interface to Python I assume that an API to do exactly that will still
 be available ;-)


Well, looping over all the possible configurations for one particular
problem size is good for benchmarking purpose only; the data generated this
way will not be re-usable unless we can make some assumption on the
input-data size. That is, if the GUI only auto-tunes GEMV/GEMM for large
square matrices, then we will collect a lot of pointless data. Instead, the
GUI should export a model which, given some input data sizes and a hardware
configuration, is able to predict the optimal kernel. This is why the
auto-tuner is being moved to pyviennacl.
However, the GUI could/should indeed still be able to execute the
corresponding python scripts.



  There is, however, one additional point I'd like to discuss. The
 performance of all the algorithms you'll have to benchmark are highly
 dependent on the characteristics of the input data. For example, matrix
 products will behave very differently according to the size/shape of the
 input matrices. This is very important : this means that a good
 benchmarking GUI could help the users to design their system.
 Here's an example. Suppose that someone wants to solve the linear system:
 A*x* = *y*

 If, for his particular application, A is a 50,000x50,000 sparse matrix,
 then he could be greatly interested in knowing how he could pad A to
 achieve better performance. In that case, the benchmarking-gui could
 explore randomly R^2 beyond (50,000 ; 50,000), and potentially tell the
 user that, if he makes A a (50,500; 50,500) matrix, then he could
 improve his performance by say 10 or 20%.


 For sparse matrices I don't believe in random patterns. The user usually
 has a particular application in mind, so I consider it more important to
  a) Allow users to feed the tuner with their own sparse matrix
  b) Allow users to select sparse matrices from the Florida matrix market
 The second option is important for benchmark purposes and for comparison
 with data in the literature. We can also add a third option for random
 matrices, but it's certainly far less important.



We could also try to describe a sparse matrix by a few parameters (number
of rows/cols, format, sparsity pattern, etc...) and use machine learning to
predict the optimal kernel given an arbitrary sparse matrix. For the
training data, we could use the Florida matrix market, indeed.





  In the case of dense matrix
 products, one may even be able to double his performance by slightly
 altering the size of the input matrices.


 Okay, this is only about adjusting the padding parameter and should
 transparently included in the tuning process anyway, shouldn't it?


This is not exactly what I meant. Suppose that someone wants to compute the
dense matrix product:
A*B
where A is in R^{238, 2031}, and B is in R^{2031, 1240}.
Then, the auto-tuner should indeed find the optimal padding size, and A and
B would be transparently padded to multiples of 128: {256,2048} and {2048,
1280}.
 However, for some reason, using matrices of size {256, 2176} and {2176,
1280} may be worth it on SGEMM (but not on DGEMM), because 2048 could
trigger a lot of bank conflicts. Similarly, one might fall on a sweet spot
of his GPU for {256,2560}x{2560,1408}.  I don't think that ViennaCL should
handle this. I can think of some applications in the field of Artificial
Neural Networks, where one may want to resize the layers of his neural
network so as to fall on some sweet spots of his GPU.

Philippe


 Best regards,
 Karli


--
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
#149; 3 signs your SCM is hindering your productivity
#149; Requirements for releasing software faster
#149; Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel


Re: [ViennaCL-devel] Benchmark GUI warmup

2014-05-05 Thread Karl Rupp
Hi,

 Well, I think this is not entirely unrelated. The purpose of the GUI
 is still to allow a broader community to feed us with benchmark
 data, so somehow the loop over all possible configurations is still
 essential. With an interface to Python I assume that an API to do
 exactly that will still be available ;-)


 Well, looping over all the possible configurations for one particular
 problem size is good for benchmarking purpose only; the data generated
 this way will not be re-usable unless we can make some assumption on the
 input-data size.

The data is reusable, of course assuming that one knows the matrix sizes 
it has been obtained for.


 That is, if the GUI only auto-tunes GEMV/GEMM for large
 square matrices, then we will collect a lot of pointless data.

Why is data pointless? I'd rather have only a few datapoints on new 
hardware out there rather than having absolutely no data at all.


 Instead,
 the GUI should export a model which, given some input data sizes and a
 hardware configuration, is able to predict the optimal kernel. This is
 why the auto-tuner is being moved to pyviennacl.
 However, the GUI could/should indeed still be able to execute the
 corresponding python scripts.

I'd rather refrain from running Python scripts from the benchmark GUI. 
This is intended to be an end-user tool. Those interested in running 
from Python should take the Python code (i.e. PyViennaCL) directly.


 For sparse matrices I don't believe in random patterns. The user
 usually has a particular application in mind, so I consider it more
 important to
   a) Allow users to feed the tuner with their own sparse matrix
   b) Allow users to select sparse matrices from the Florida matrix
 market
 The second option is important for benchmark purposes and for
 comparison with data in the literature. We can also add a third
 option for random matrices, but it's certainly far less important.



 We could also try to describe a sparse matrix by a few parameters
 (number of rows/cols, format, sparsity pattern, etc...) and use machine
 learning to predict the optimal kernel given an arbitrary sparse matrix.
 For the training data, we could use the Florida matrix market, indeed.

I agree with this approach. Everything is better than using a fixed work 
group size as we do now (even though this is how other libraries deal 
with the problem as well)


 In the case of dense matrix
 products, one may even be able to double his performance by slightly
 altering the size of the input matrices.


 Okay, this is only about adjusting the padding parameter and should
 transparently included in the tuning process anyway, shouldn't it?


 This is not exactly what I meant. Suppose that someone wants to compute
 the dense matrix product:
 A*B
 where A is in R^{238, 2031}, and B is in R^{2031, 1240}.
 Then, the auto-tuner should indeed find the optimal padding size, and A
 and B would be transparently padded to multiples of 128: {256,2048} and
 {2048, 1280}.
   However, for some reason, using matrices of size {256, 2176} and
 {2176, 1280} may be worth it on SGEMM (but not on DGEMM), because 2048
 could trigger a lot of bank conflicts. Similarly, one might fall on a
 sweet spot of his GPU for {256,2560}x{2560,1408}.  I don't think that
 ViennaCL should handle this. I can think of some applications in the
 field of Artificial Neural Networks, where one may want to resize the
 layers of his neural network so as to fall on some sweet spots of his GPU.

Such cases are probably only interesting for the 'expert settings' tab 
in the GUI, as these parameters only make sense to people who *really* 
know what they are doing (and willing to invest the time). For bloggers, 
journalists, etc., who just want to quickly get some performance 
datapoints for the very latest hardware, this is usually not of 
interest. We need to focus on serving the main audience first and then 
watch out for fruitful directions on how to extend it further.

Best regards,
Karli


--
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
#149; 3 signs your SCM is hindering your productivity
#149; Requirements for releasing software faster
#149; Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce
___
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel


Re: [ViennaCL-devel] Benchmark GUI warmup

2014-05-05 Thread Philippe Tillet
Hi hi,


2014-05-05 21:49 GMT+02:00 Karl Rupp r...@iue.tuwien.ac.at:

 Hi,

  Well, I think this is not entirely unrelated. The purpose of the GUI
 is still to allow a broader community to feed us with benchmark
 data, so somehow the loop over all possible configurations is still
 essential. With an interface to Python I assume that an API to do
 exactly that will still be available ;-)


 Well, looping over all the possible configurations for one particular
 problem size is good for benchmarking purpose only; the data generated
 this way will not be re-usable unless we can make some assumption on the
 input-data size.


 The data is reusable, of course assuming that one knows the matrix sizes
 it has been obtained for.


  That is, if the GUI only auto-tunes GEMV/GEMM for large
 square matrices, then we will collect a lot of pointless data.


 Why is data pointless? I'd rather have only a few datapoints on new
 hardware out there rather than having absolutely no data at all.


I mean, the data is pretty useful because it tells us about the best
default kernel for large square matrices, but it is not very useful if we
want to build a general input-dependent model, as it requires in my
experience more than 1000 data points.



  Instead,
 the GUI should export a model which, given some input data sizes and a
 hardware configuration, is able to predict the optimal kernel. This is
 why the auto-tuner is being moved to pyviennacl.
 However, the GUI could/should indeed still be able to execute the
 corresponding python scripts.


 I'd rather refrain from running Python scripts from the benchmark GUI.
 This is intended to be an end-user tool. Those interested in running from
 Python should take the Python code (i.e. PyViennaCL) directly.


Are you sure? It would not take a lot of efforts to have an optional way to
call the python script with the proper arguments from the auto-tuner, as
long as the user provides the path and that he has all the necessary
dependencies.



  For sparse matrices I don't believe in random patterns. The user
 usually has a particular application in mind, so I consider it more
 important to
   a) Allow users to feed the tuner with their own sparse matrix
   b) Allow users to select sparse matrices from the Florida matrix
 market
 The second option is important for benchmark purposes and for
 comparison with data in the literature. We can also add a third
 option for random matrices, but it's certainly far less important.



 We could also try to describe a sparse matrix by a few parameters
 (number of rows/cols, format, sparsity pattern, etc...) and use machine
 learning to predict the optimal kernel given an arbitrary sparse matrix.
 For the training data, we could use the Florida matrix market, indeed.


 I agree with this approach. Everything is better than using a fixed work
 group size as we do now (even though this is how other libraries deal with
 the problem as well)


  In the case of dense matrix
 products, one may even be able to double his performance by
 slightly
 altering the size of the input matrices.


 Okay, this is only about adjusting the padding parameter and should
 transparently included in the tuning process anyway, shouldn't it?


 This is not exactly what I meant. Suppose that someone wants to compute
 the dense matrix product:
 A*B
 where A is in R^{238, 2031}, and B is in R^{2031, 1240}.
 Then, the auto-tuner should indeed find the optimal padding size, and A
 and B would be transparently padded to multiples of 128: {256,2048} and
 {2048, 1280}.
   However, for some reason, using matrices of size {256, 2176} and
 {2176, 1280} may be worth it on SGEMM (but not on DGEMM), because 2048
 could trigger a lot of bank conflicts. Similarly, one might fall on a
 sweet spot of his GPU for {256,2560}x{2560,1408}.  I don't think that
 ViennaCL should handle this. I can think of some applications in the
 field of Artificial Neural Networks, where one may want to resize the
 layers of his neural network so as to fall on some sweet spots of his GPU.


 Such cases are probably only interesting for the 'expert settings' tab in
 the GUI, as these parameters only make sense to people who *really* know
 what they are doing (and willing to invest the time). For bloggers,
 journalists, etc., who just want to quickly get some performance datapoints
 for the very latest hardware, this is usually not of interest. We need to
 foc

us on serving the main audience first and then watch out for fruitful
 directions on how to extend it further.


Of course ! I've been referring to the expert settings tab from the
beginning :)

Philippe


 Best regards,
 Karli


--
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
#149; 3 signs your SCM is hindering your productivity
#149; Requirements for releasing