[ https://issues.apache.org/jira/browse/ARROW-4313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16748409#comment-16748409 ]
Areg Melik-Adamyan commented on ARROW-4313: ------------------------------------------- I think it will be easy if we keep it a little bit simple in the beginning, not to redo a lot in the future. So replies to original comments: * Timestamp of benchmark run - *We should be careful, as this is helpful, but you cannot rely on this timestamp as, there is no guarantee that systems are synced in time. So for purely informational purposes, it is fine.* * Git commit hash of codebase * Machine unique name (sort of the "user id") - *Machine ID and machine information should go to a different database, as they can change, come and go, you do not want to keep that info tied to benchmarks* * CPU identification for machine, and clock frequency (in case of overclocking) * CPU cache sizes (L1/L2/L3) * Whether or not CPU throttling is enabled (if it can be easily determined) - *for benchmarking you should always set it to max, not fixing the governor will add additional unpredictable flakiness to the benchmarks. Also you need to lock machine when the benchmarks are running to prevent noise.* * RAM size * GPU identification (if any) * Benchmark unique name - *For the start I would say yes, but it can quickly get out of control, as you have e.g. TestFeatureA, then it gets flavors, like input size, and you start naming it TestFeatureA5GB, then* *TestFeatureA5GB-CPU,* *TestFeatureA5GB-GPU-Nvidia,* *TestFeatureA5GB-GPU-Radeon, and it gets out of control. The best know method to control is hierarchical name or unique id with benchmark table, which is kind of overkill for now.*** * Programming language(s) associated with benchmark (e.g. a benchmark may involve both C++ and Python) - *Why would you need this? Maybe put into hierarchical name?* * Benchmark time, plus mean and standard deviation if available, else NULL ** > Define general benchmark database schema > ---------------------------------------- > > Key: ARROW-4313 > URL: https://issues.apache.org/jira/browse/ARROW-4313 > Project: Apache Arrow > Issue Type: New Feature > Components: Benchmarking > Reporter: Wes McKinney > Priority: Major > Fix For: 0.13.0 > > > Some possible attributes that the benchmark database should track, to permit > heterogeneity of hardware and programming languages > * Timestamp of benchmark run > * Git commit hash of codebase > * Machine unique name (sort of the "user id") > * CPU identification for machine, and clock frequency (in case of > overclocking) > * CPU cache sizes (L1/L2/L3) > * Whether or not CPU throttling is enabled (if it can be easily determined) > * RAM size > * GPU identification (if any) > * Benchmark unique name > * Programming language(s) associated with benchmark (e.g. a benchmark > may involve both C++ and Python) > * Benchmark time, plus mean and standard deviation if available, else NULL > see discussion on mailing list > https://lists.apache.org/thread.html/278e573445c83bbd8ee66474b9356c5291a16f6b6eca11dbbe4b473a@%3Cdev.arrow.apache.org%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)