Re: [PR] feat: Add random row generator in data generator [datafusion-comet]

via GitHub Wed, 22 May 2024 10:58:12 -0700


andygrove commented on code in PR #451:
URL: https://github.com/apache/datafusion-comet/pull/451#discussion_r1610422808



##########
spark/src/test/scala/org/apache/comet/DataGenerator.scala:
##########
@@ -95,4 +102,38 @@ class DataGenerator(r: Random) {
       Range(0, n).map(_ => r.nextLong())
   }
 
+  // Generate a random row according to the schema, the string filed in the 
struct could be
+  // configured to generate strings by passing a stringGen function. Other 
types are delegated
+  // to Spark's RandomDataGenerator.
+  def generateRow(schema: StructType, stringGen: Option[() => String] = None): 
Row = {
+    val fields = mutable.ArrayBuffer.empty[Any]
+    schema.fields.foreach { f =>
+      f.dataType match {
+        case StructType(children) =>
+          fields += generateRow(StructType(children), stringGen)
+        case StringType if stringGen.isDefined =>
+          val gen = stringGen.get
+          val data = if (f.nullable && r.nextFloat() <= PROBABILITY_OF_NULL) {
+            null
+          } else {
+            gen()
+          }
+          fields += data
+        case _ =>
+          val generator = RandomDataGenerator.forType(f.dataType, f.nullable, 
r)
+          assert(generator.isDefined, "Unsupported type")
+          val gen = generator.get

Review Comment:
   Rather than use `isDefined` and `get`, it may be more idiomatic to use a 
`match` statement.
   
   ```scala
             val gen = RandomDataGenerator.forType(f.dataType, f.nullable, r) 
match {
               case Some(generator) => generator
               case None => throw new IllegalStateException(s"No 
RandomDataGenerator for type ${f.dataType}")
             }
             fields += gen()
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: Add random row generator in data generator [datafusion-comet]

Reply via email to