Re: [ViennaCL-devel] Benchmark GUI warmup
Hi, Why is data pointless? I'd rather have only a few datapoints on new hardware out there rather than having absolutely no data at all. I mean, the data is pretty useful because it tells us about the best default kernel for large square matrices, but it is not very useful if we want to build a general input-dependent model, as it requires in my experience more than 1000 data points. This is true. So this calls for an hierarchical approach: Level 1: Just a couple of known kernels for a given data size, which are compared on the target machine. Level 2: A full tuning set for one data size on the target Level 3: All ~1000 points for building a model Execution times between these levels vary significantly: While almost all users will go through Level 1 anyway, only a few will have the patience to wait for results on Level 2. Level 3 will be mostly for us to have a 'normalized' process for building performance models. Either way, if others will join (machine learning community?), that would be great! I'd rather refrain from running Python scripts from the benchmark GUI. This is intended to be an end-user tool. Those interested in running from Python should take the Python code (i.e. PyViennaCL) directly. Are you sure? It would not take a lot of efforts to have an optional way to call the python script with the proper arguments from the auto-tuner, as long as the user provides the path and that he has all the necessary dependencies. The second half of the last sentence is the problem. I expect 80% of users to run on Windows, where anything but a 'double click installer' is a non-standard process. If Namik has time left by the end of the summer, we can look into that, but we first need to focus on our target audience. Such cases are probably only interesting for the 'expert settings' tab in the GUI, as these parameters only make sense to people who *really* know what they are doing (and willing to invest the time). For bloggers, journalists, etc., who just want to quickly get some performance datapoints for the very latest hardware, this is usually not of interest. We need to foc us on serving the main audience first and then watch out for fruitful directions on how to extend it further. Of course ! I've been referring to the expert settings tab from the beginning :) Ah, please say so :-) Best regards, Karli -- Is your legacy SCM system holding you back? Join Perforce May 7 to find out: #149; 3 signs your SCM is hindering your productivity #149; Requirements for releasing software faster #149; Expert tips and advice for migrating your SCM now http://p.sf.net/sfu/perforce ___ ViennaCL-devel mailing list ViennaCL-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/viennacl-devel
Re: [ViennaCL-devel] Benchmark GUI warmup
Hello, Apologies for not replying earlier, I've been quite busy these last two days. So far I have been exploring the advantages/disadvantages of using QML/QtQuick vs traditional widget based GUI. QML has some great design features that could improve the overall user experience and aren't easily implemented when using widgets. I was originally planning to develop some parts using QML ( animations and charts ) and integrate them with the main widget based GUI. However I am now exploring the possibility of doing the entire GUI in QML. Suggestions on which approach to choose are welcome. Reading through your discussion about expert benchmark setting I see that I probably should have spent more time studying the autotuner and benchmark codes :/ I understand that there is a great need for expert benchmark customization and I hope to succeed in making that part as detailed as possible, but there should a certain limit to the extent of details. What I'm saying is I'd rather not spend time developing features that will be used only a couple of times. Surely there are some details that aren't of critical importance? It would be great if you guys could agree on what expert details are of greatest priority. I'm going to start studying the autotuner and benchmark codes so I can better understand what needs to be done. Best regards, Namik On Tue, May 6, 2014 at 9:38 AM, Karl Rupp r...@iue.tuwien.ac.at wrote: Hi, Why is data pointless? I'd rather have only a few datapoints on new hardware out there rather than having absolutely no data at all. I mean, the data is pretty useful because it tells us about the best default kernel for large square matrices, but it is not very useful if we want to build a general input-dependent model, as it requires in my experience more than 1000 data points. This is true. So this calls for an hierarchical approach: Level 1: Just a couple of known kernels for a given data size, which are compared on the target machine. Level 2: A full tuning set for one data size on the target Level 3: All ~1000 points for building a model Execution times between these levels vary significantly: While almost all users will go through Level 1 anyway, only a few will have the patience to wait for results on Level 2. Level 3 will be mostly for us to have a 'normalized' process for building performance models. Either way, if others will join (machine learning community?), that would be great! I'd rather refrain from running Python scripts from the benchmark GUI. This is intended to be an end-user tool. Those interested in running from Python should take the Python code (i.e. PyViennaCL) directly. Are you sure? It would not take a lot of efforts to have an optional way to call the python script with the proper arguments from the auto-tuner, as long as the user provides the path and that he has all the necessary dependencies. The second half of the last sentence is the problem. I expect 80% of users to run on Windows, where anything but a 'double click installer' is a non-standard process. If Namik has time left by the end of the summer, we can look into that, but we first need to focus on our target audience. Such cases are probably only interesting for the 'expert settings' tab in the GUI, as these parameters only make sense to people who *really* know what they are doing (and willing to invest the time). For bloggers, journalists, etc., who just want to quickly get some performance datapoints for the very latest hardware, this is usually not of interest. We need to foc us on serving the main audience first and then watch out for fruitful directions on how to extend it further. Of course ! I've been referring to the expert settings tab from the beginning :) Ah, please say so :-) Best regards, Karli -- Is your legacy SCM system holding you back? Join Perforce May 7 to find out: #149; 3 signs your SCM is hindering your productivity #149; Requirements for releasing software faster #149; Expert tips and advice for migrating your SCM now http://p.sf.net/sfu/perforce___ ViennaCL-devel mailing list ViennaCL-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/viennacl-devel
Re: [ViennaCL-devel] Benchmark GUI warmup
Hey Namik, 2014-05-06 19:43 GMT+02:00 Namik Karovic namik.karo...@gmail.com: Hello, Apologies for not replying earlier, I've been quite busy these last two days. Don't worry ;) So far I have been exploring the advantages/disadvantages of using QML/QtQuick vs traditional widget based GUI. QML has some great design features that could improve the overall user experience and aren't easily implemented when using widgets. I was originally planning to develop some parts using QML ( animations and charts ) and integrate them with the main widget based GUI. However I am now exploring the possibility of doing the entire GUI in QML. Suggestions on which approach to choose are welcome. Unfortunately, I don't know much about Qt, so I probably couldn't help here. However, keep in mind that we aim for maximum portability. I would tend to think that QML is more portable accross languages, and so I would say go for it, as long as you don't loose portability elsewhere. Reading through your discussion about expert benchmark setting I see that I probably should have spent more time studying the autotuner and benchmark codes :/ I understand that there is a great need for expert benchmark customization and I hope to succeed in making that part as detailed as possible, but there should a certain limit to the extent of details. What I'm saying is I'd rather not spend time developing features that will be used only a couple of times. Surely there are some details that aren't of critical importance? It would be great if you guys could agree on what expert details are of greatest priority. I'm going to start studying the autotuner and benchmark codes so I can better understand what needs to be done. I think that the most important part of the project is the intuitiveness/functionnality of the GUI. Keep in mind that most of your userbase will have a limited amount of time, and that anything beyond double-click+coffee break will probably be ignored ;) I really believe that there is no need to read any code related to the auto-tuner, as it is disappearing. Re-implementing an exhaustive search for one particular size for the GUI will not be a huge challenge, so don't worry too much about it. This thread is exclusively dedicated to possible features in the expert tab, which is not a priority for now (but it's still good to have some mid-term perspective when starting a project). That being said, I believe that the Basic options should include: - Benchmarking of as many routines as possible : BLAS, FFT, Solvers, etc... - Simple exhaustive-search auto-tuning for what supports it : What could this hardware ideally give on this problem - Export of the benchmark results to an open database I don't think you should worry about anything else as of now. I'll be working rather actively on a command-line interface to some advanced auto-tuning features. Philippe Best regards, Namik On Tue, May 6, 2014 at 9:38 AM, Karl Rupp r...@iue.tuwien.ac.at wrote: Hi, Why is data pointless? I'd rather have only a few datapoints on new hardware out there rather than having absolutely no data at all. I mean, the data is pretty useful because it tells us about the best default kernel for large square matrices, but it is not very useful if we want to build a general input-dependent model, as it requires in my experience more than 1000 data points. This is true. So this calls for an hierarchical approach: Level 1: Just a couple of known kernels for a given data size, which are compared on the target machine. Level 2: A full tuning set for one data size on the target Level 3: All ~1000 points for building a model Execution times between these levels vary significantly: While almost all users will go through Level 1 anyway, only a few will have the patience to wait for results on Level 2. Level 3 will be mostly for us to have a 'normalized' process for building performance models. Either way, if others will join (machine learning community?), that would be great! I'd rather refrain from running Python scripts from the benchmark GUI. This is intended to be an end-user tool. Those interested in running from Python should take the Python code (i.e. PyViennaCL) directly. Are you sure? It would not take a lot of efforts to have an optional way to call the python script with the proper arguments from the auto-tuner, as long as the user provides the path and that he has all the necessary dependencies. The second half of the last sentence is the problem. I expect 80% of users to run on Windows, where anything but a 'double click installer' is a non-standard process. If Namik has time left by the end of the summer, we can look into that, but we first need to focus on our target audience. Such cases are probably only interesting for the 'expert settings' tab in the GUI, as these parameters only make sense to people who *really* know what
Re: [ViennaCL-devel] Benchmark GUI warmup
Hi, 2014-05-06 9:38 GMT+02:00 Karl Rupp r...@iue.tuwien.ac.at: Hi, Why is data pointless? I'd rather have only a few datapoints on new hardware out there rather than having absolutely no data at all. I mean, the data is pretty useful because it tells us about the best default kernel for large square matrices, but it is not very useful if we want to build a general input-dependent model, as it requires in my experience more than 1000 data points. This is true. So this calls for an hierarchical approach: Level 1: Just a couple of known kernels for a given data size, which are compared on the target machine. Level 2: A full tuning set for one data size on the target Level 3: All ~1000 points for building a model Execution times between these levels vary significantly: While almost all users will go through Level 1 anyway, only a few will have the patience to wait for results on Level 2. Level 3 will be mostly for us to have a 'normalized' process for building performance models. Either way, if others will join (machine learning community?), that would be great! I'd rather refrain from running Python scripts from the benchmark GUI. This is intended to be an end-user tool. Those interested in running from Python should take the Python code (i.e. PyViennaCL) directly. Are you sure? It would not take a lot of efforts to have an optional way to call the python script with the proper arguments from the auto-tuner, as long as the user provides the path and that he has all the necessary dependencies. The second half of the last sentence is the problem. I expect 80% of users to run on Windows, where anything but a 'double click installer' is a non-standard process. If Namik has time left by the end of the summer, we can look into that, but we first need to focus on our target audience. I think you're right. Namik's GUI should only provide Level 1 and 2, which do not require any Python. Since Level 3 would be an internal tool as you correctly pointed it out, we could stick to a python command-line interface, or a rudimentary PyQt GUI. Philippe Such cases are probably only interesting for the 'expert settings' tab in the GUI, as these parameters only make sense to people who *really* know what they are doing (and willing to invest the time). For bloggers, journalists, etc., who just want to quickly get some performance datapoints for the very latest hardware, this is usually not of interest. We need to foc us on serving the main audience first and then watch out for fruitful directions on how to extend it further. Of course ! I've been referring to the expert settings tab from the beginning :) Ah, please say so :-) Best regards, Karli -- Is your legacy SCM system holding you back? Join Perforce May 7 to find out: #149; 3 signs your SCM is hindering your productivity #149; Requirements for releasing software faster #149; Expert tips and advice for migrating your SCM now http://p.sf.net/sfu/perforce___ ViennaCL-devel mailing list ViennaCL-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/viennacl-devel
Re: [ViennaCL-devel] Benchmark GUI warmup
Hi, (CC-ing viennacl-devel, as this is developer-talk ;-) ) Either way, I want to let you know that the generator/auto-tuner is undergoing significant changes, and that you will, actually, not have to worry about it for your GSoC project. The generator will be used transparently via the viennacl::linalg:: functions, and the auto-tuner will be entirely moved to pyviennacl. Well, I think this is not entirely unrelated. The purpose of the GUI is still to allow a broader community to feed us with benchmark data, so somehow the loop over all possible configurations is still essential. With an interface to Python I assume that an API to do exactly that will still be available ;-) There is, however, one additional point I'd like to discuss. The performance of all the algorithms you'll have to benchmark are highly dependent on the characteristics of the input data. For example, matrix products will behave very differently according to the size/shape of the input matrices. This is very important : this means that a good benchmarking GUI could help the users to design their system. Here's an example. Suppose that someone wants to solve the linear system: A*x* = *y* If, for his particular application, A is a 50,000x50,000 sparse matrix, then he could be greatly interested in knowing how he could pad A to achieve better performance. In that case, the benchmarking-gui could explore randomly R^2 beyond (50,000 ; 50,000), and potentially tell the user that, if he makes A a (50,500; 50,500) matrix, then he could improve his performance by say 10 or 20%. For sparse matrices I don't believe in random patterns. The user usually has a particular application in mind, so I consider it more important to a) Allow users to feed the tuner with their own sparse matrix b) Allow users to select sparse matrices from the Florida matrix market The second option is important for benchmark purposes and for comparison with data in the literature. We can also add a third option for random matrices, but it's certainly far less important. In the case of dense matrix products, one may even be able to double his performance by slightly altering the size of the input matrices. Okay, this is only about adjusting the padding parameter and should transparently included in the tuning process anyway, shouldn't it? Best regards, Karli -- Is your legacy SCM system holding you back? Join Perforce May 7 to find out: #149; 3 signs your SCM is hindering your productivity #149; Requirements for releasing software faster #149; Expert tips and advice for migrating your SCM now http://p.sf.net/sfu/perforce ___ ViennaCL-devel mailing list ViennaCL-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/viennacl-devel
Re: [ViennaCL-devel] Benchmark GUI warmup
Hi, 2014-05-05 9:18 GMT+02:00 Karl Rupp r...@iue.tuwien.ac.at: Hi, (CC-ing viennacl-devel, as this is developer-talk ;-) ) Either way, I want to let you know that the generator/auto-tuner is undergoing significant changes, and that you will, actually, not have to worry about it for your GSoC project. The generator will be used transparently via the viennacl::linalg:: functions, and the auto-tuner will be entirely moved to pyviennacl. Well, I think this is not entirely unrelated. The purpose of the GUI is still to allow a broader community to feed us with benchmark data, so somehow the loop over all possible configurations is still essential. With an interface to Python I assume that an API to do exactly that will still be available ;-) Well, looping over all the possible configurations for one particular problem size is good for benchmarking purpose only; the data generated this way will not be re-usable unless we can make some assumption on the input-data size. That is, if the GUI only auto-tunes GEMV/GEMM for large square matrices, then we will collect a lot of pointless data. Instead, the GUI should export a model which, given some input data sizes and a hardware configuration, is able to predict the optimal kernel. This is why the auto-tuner is being moved to pyviennacl. However, the GUI could/should indeed still be able to execute the corresponding python scripts. There is, however, one additional point I'd like to discuss. The performance of all the algorithms you'll have to benchmark are highly dependent on the characteristics of the input data. For example, matrix products will behave very differently according to the size/shape of the input matrices. This is very important : this means that a good benchmarking GUI could help the users to design their system. Here's an example. Suppose that someone wants to solve the linear system: A*x* = *y* If, for his particular application, A is a 50,000x50,000 sparse matrix, then he could be greatly interested in knowing how he could pad A to achieve better performance. In that case, the benchmarking-gui could explore randomly R^2 beyond (50,000 ; 50,000), and potentially tell the user that, if he makes A a (50,500; 50,500) matrix, then he could improve his performance by say 10 or 20%. For sparse matrices I don't believe in random patterns. The user usually has a particular application in mind, so I consider it more important to a) Allow users to feed the tuner with their own sparse matrix b) Allow users to select sparse matrices from the Florida matrix market The second option is important for benchmark purposes and for comparison with data in the literature. We can also add a third option for random matrices, but it's certainly far less important. We could also try to describe a sparse matrix by a few parameters (number of rows/cols, format, sparsity pattern, etc...) and use machine learning to predict the optimal kernel given an arbitrary sparse matrix. For the training data, we could use the Florida matrix market, indeed. In the case of dense matrix products, one may even be able to double his performance by slightly altering the size of the input matrices. Okay, this is only about adjusting the padding parameter and should transparently included in the tuning process anyway, shouldn't it? This is not exactly what I meant. Suppose that someone wants to compute the dense matrix product: A*B where A is in R^{238, 2031}, and B is in R^{2031, 1240}. Then, the auto-tuner should indeed find the optimal padding size, and A and B would be transparently padded to multiples of 128: {256,2048} and {2048, 1280}. However, for some reason, using matrices of size {256, 2176} and {2176, 1280} may be worth it on SGEMM (but not on DGEMM), because 2048 could trigger a lot of bank conflicts. Similarly, one might fall on a sweet spot of his GPU for {256,2560}x{2560,1408}. I don't think that ViennaCL should handle this. I can think of some applications in the field of Artificial Neural Networks, where one may want to resize the layers of his neural network so as to fall on some sweet spots of his GPU. Philippe Best regards, Karli -- Is your legacy SCM system holding you back? Join Perforce May 7 to find out: #149; 3 signs your SCM is hindering your productivity #149; Requirements for releasing software faster #149; Expert tips and advice for migrating your SCM now http://p.sf.net/sfu/perforce___ ViennaCL-devel mailing list ViennaCL-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/viennacl-devel
Re: [ViennaCL-devel] Benchmark GUI warmup
Hi, Well, I think this is not entirely unrelated. The purpose of the GUI is still to allow a broader community to feed us with benchmark data, so somehow the loop over all possible configurations is still essential. With an interface to Python I assume that an API to do exactly that will still be available ;-) Well, looping over all the possible configurations for one particular problem size is good for benchmarking purpose only; the data generated this way will not be re-usable unless we can make some assumption on the input-data size. The data is reusable, of course assuming that one knows the matrix sizes it has been obtained for. That is, if the GUI only auto-tunes GEMV/GEMM for large square matrices, then we will collect a lot of pointless data. Why is data pointless? I'd rather have only a few datapoints on new hardware out there rather than having absolutely no data at all. Instead, the GUI should export a model which, given some input data sizes and a hardware configuration, is able to predict the optimal kernel. This is why the auto-tuner is being moved to pyviennacl. However, the GUI could/should indeed still be able to execute the corresponding python scripts. I'd rather refrain from running Python scripts from the benchmark GUI. This is intended to be an end-user tool. Those interested in running from Python should take the Python code (i.e. PyViennaCL) directly. For sparse matrices I don't believe in random patterns. The user usually has a particular application in mind, so I consider it more important to a) Allow users to feed the tuner with their own sparse matrix b) Allow users to select sparse matrices from the Florida matrix market The second option is important for benchmark purposes and for comparison with data in the literature. We can also add a third option for random matrices, but it's certainly far less important. We could also try to describe a sparse matrix by a few parameters (number of rows/cols, format, sparsity pattern, etc...) and use machine learning to predict the optimal kernel given an arbitrary sparse matrix. For the training data, we could use the Florida matrix market, indeed. I agree with this approach. Everything is better than using a fixed work group size as we do now (even though this is how other libraries deal with the problem as well) In the case of dense matrix products, one may even be able to double his performance by slightly altering the size of the input matrices. Okay, this is only about adjusting the padding parameter and should transparently included in the tuning process anyway, shouldn't it? This is not exactly what I meant. Suppose that someone wants to compute the dense matrix product: A*B where A is in R^{238, 2031}, and B is in R^{2031, 1240}. Then, the auto-tuner should indeed find the optimal padding size, and A and B would be transparently padded to multiples of 128: {256,2048} and {2048, 1280}. However, for some reason, using matrices of size {256, 2176} and {2176, 1280} may be worth it on SGEMM (but not on DGEMM), because 2048 could trigger a lot of bank conflicts. Similarly, one might fall on a sweet spot of his GPU for {256,2560}x{2560,1408}. I don't think that ViennaCL should handle this. I can think of some applications in the field of Artificial Neural Networks, where one may want to resize the layers of his neural network so as to fall on some sweet spots of his GPU. Such cases are probably only interesting for the 'expert settings' tab in the GUI, as these parameters only make sense to people who *really* know what they are doing (and willing to invest the time). For bloggers, journalists, etc., who just want to quickly get some performance datapoints for the very latest hardware, this is usually not of interest. We need to focus on serving the main audience first and then watch out for fruitful directions on how to extend it further. Best regards, Karli -- Is your legacy SCM system holding you back? Join Perforce May 7 to find out: #149; 3 signs your SCM is hindering your productivity #149; Requirements for releasing software faster #149; Expert tips and advice for migrating your SCM now http://p.sf.net/sfu/perforce ___ ViennaCL-devel mailing list ViennaCL-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/viennacl-devel
Re: [ViennaCL-devel] Benchmark GUI warmup
Hi hi, 2014-05-05 21:49 GMT+02:00 Karl Rupp r...@iue.tuwien.ac.at: Hi, Well, I think this is not entirely unrelated. The purpose of the GUI is still to allow a broader community to feed us with benchmark data, so somehow the loop over all possible configurations is still essential. With an interface to Python I assume that an API to do exactly that will still be available ;-) Well, looping over all the possible configurations for one particular problem size is good for benchmarking purpose only; the data generated this way will not be re-usable unless we can make some assumption on the input-data size. The data is reusable, of course assuming that one knows the matrix sizes it has been obtained for. That is, if the GUI only auto-tunes GEMV/GEMM for large square matrices, then we will collect a lot of pointless data. Why is data pointless? I'd rather have only a few datapoints on new hardware out there rather than having absolutely no data at all. I mean, the data is pretty useful because it tells us about the best default kernel for large square matrices, but it is not very useful if we want to build a general input-dependent model, as it requires in my experience more than 1000 data points. Instead, the GUI should export a model which, given some input data sizes and a hardware configuration, is able to predict the optimal kernel. This is why the auto-tuner is being moved to pyviennacl. However, the GUI could/should indeed still be able to execute the corresponding python scripts. I'd rather refrain from running Python scripts from the benchmark GUI. This is intended to be an end-user tool. Those interested in running from Python should take the Python code (i.e. PyViennaCL) directly. Are you sure? It would not take a lot of efforts to have an optional way to call the python script with the proper arguments from the auto-tuner, as long as the user provides the path and that he has all the necessary dependencies. For sparse matrices I don't believe in random patterns. The user usually has a particular application in mind, so I consider it more important to a) Allow users to feed the tuner with their own sparse matrix b) Allow users to select sparse matrices from the Florida matrix market The second option is important for benchmark purposes and for comparison with data in the literature. We can also add a third option for random matrices, but it's certainly far less important. We could also try to describe a sparse matrix by a few parameters (number of rows/cols, format, sparsity pattern, etc...) and use machine learning to predict the optimal kernel given an arbitrary sparse matrix. For the training data, we could use the Florida matrix market, indeed. I agree with this approach. Everything is better than using a fixed work group size as we do now (even though this is how other libraries deal with the problem as well) In the case of dense matrix products, one may even be able to double his performance by slightly altering the size of the input matrices. Okay, this is only about adjusting the padding parameter and should transparently included in the tuning process anyway, shouldn't it? This is not exactly what I meant. Suppose that someone wants to compute the dense matrix product: A*B where A is in R^{238, 2031}, and B is in R^{2031, 1240}. Then, the auto-tuner should indeed find the optimal padding size, and A and B would be transparently padded to multiples of 128: {256,2048} and {2048, 1280}. However, for some reason, using matrices of size {256, 2176} and {2176, 1280} may be worth it on SGEMM (but not on DGEMM), because 2048 could trigger a lot of bank conflicts. Similarly, one might fall on a sweet spot of his GPU for {256,2560}x{2560,1408}. I don't think that ViennaCL should handle this. I can think of some applications in the field of Artificial Neural Networks, where one may want to resize the layers of his neural network so as to fall on some sweet spots of his GPU. Such cases are probably only interesting for the 'expert settings' tab in the GUI, as these parameters only make sense to people who *really* know what they are doing (and willing to invest the time). For bloggers, journalists, etc., who just want to quickly get some performance datapoints for the very latest hardware, this is usually not of interest. We need to foc us on serving the main audience first and then watch out for fruitful directions on how to extend it further. Of course ! I've been referring to the expert settings tab from the beginning :) Philippe Best regards, Karli -- Is your legacy SCM system holding you back? Join Perforce May 7 to find out: #149; 3 signs your SCM is hindering your productivity #149; Requirements for releasing