asuhan commented on a change in pull request #11864:
URL: https://github.com/apache/arrow/pull/11864#discussion_r773578575



##########
File path: cpp/src/arrow/compute/kernels/scalar_random.cc
##########
@@ -0,0 +1,99 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <memory>
+#include <mutex>
+#include <random>
+
+#include "arrow/compute/api_scalar.h"
+#include "arrow/compute/kernel.h"
+#include "arrow/compute/kernels/common.h"
+#include "arrow/compute/registry.h"
+#include "arrow/util/pcg_random.h"
+
+namespace arrow {
+namespace compute {
+namespace internal {
+
+namespace {
+
+// Generates a random floating point number in range [0, 1).
+double generate_uniform(random::pcg64_fast& rng) {
+  // This equation is copied from numpy. It calculates `rng() / 2^64` and
+  // the return value is strictly less than 1.
+  static_assert(random::pcg64_fast::min() == 0ULL, "");
+  static_assert(random::pcg64_fast::max() == ~0ULL, "");
+  return (rng() >> 11) * (1.0 / 9007199254740992.0);
+}
+
+using RandomState = OptionsWrapper<RandomOptions>;
+
+Status ExecRandom(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+  static random::pcg64_fast seed_gen((std::random_device{})());

Review comment:
       I had already taken a look at using seed sequences, but it's not clear 
to me whether we want its trade-offs - or maybe I don't understand well enough 
how it's supposed to work. My first concern is thread safety, since under the 
hood it calls `std::random_device`, so this would force us to construct the PCG 
generator under a lock or make it thread local. My second concern is 
performance, I don't think it'd be as fast as what we're doing in the current 
version.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to