aglinxinyuan commented on code in PR #4669:
URL: https://github.com/apache/texera/pull/4669#discussion_r3198736029
##########
config-service/src/main/scala/org/apache/texera/service/resource/ConfigResource.scala:
##########
@@ -57,7 +58,10 @@ class ConfigResource {
),
"activeTimeInMinutes" ->
GuiConfig.guiWorkflowWorkspaceActiveTimeInMinutes,
"copilotEnabled" -> GuiConfig.guiWorkflowWorkspaceCopilotEnabled,
- "limitColumns" -> GuiConfig.guiWorkflowWorkspaceLimitColumns,
+ "limitColumns" -> SiteSettings.getInt(
+ "result_table_columns_per_batch",
+ GuiConfig.guiWorkflowWorkspaceLimitColumns
+ ),
Review Comment:
The admin UI no longer writes this key — leftover from the original design.
Please revert this hunk back to the direct
`GuiConfig.guiWorkflowWorkspaceLimitColumns` read.
##########
common/dao/src/main/scala/org/apache/texera/dao/SiteSettings.scala:
##########
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.texera.dao
+
+import org.jooq.impl.DSL
+
+import scala.util.Try
+
+/**
+ * Read-side accessor for the `site_settings` key/value table that admin pages
+ * write through. Centralises the "look up by key, parse, fall back on any
+ * failure" pattern that previously lived inline in ConfigResource,
+ * CSVScanSourceOpExec, and DatasetResource.
+ *
+ * Failures swallowed by the outer Try include: SqlServer not initialised
+ * (e.g. on workers in distributed mode), no row for the key, and value that
+ * can't be parsed. In all of these cases the caller's default takes over.
Review Comment:
0% test coverage. Please extract the parse-or-fallback logic into a pure
function and add a unit test.
##########
common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/source/scan/csv/CSVScanSourceOpExec.scala:
##########
@@ -106,3 +112,41 @@ class CSVScanSourceOpExec private[csv] (descString:
String) extends SourceOperat
}
}
}
+
+object CSVScanSourceOpExec {
+ val DEFAULT_MAX_COLUMNS = 512
+
+ def getMaxColumns: Int =
+ SiteSettings.getInt("csv_parser_max_columns", DEFAULT_MAX_COLUMNS)
+
+ /**
+ * Wraps `parser.parseNext()` so a column-count overflow is reported to the
user
+ * as a clear instruction rather than a deep Univocity stack trace. Other
parser
+ * failures are rethrown unchanged.
+ *
+ * The thrown RuntimeException's message bubbles up through
DataProcessor.handleExecutorException
+ * and becomes the title of the console message that drives the top-of-page
toast.
+ */
+ def parseNextRow(parser: CsvParser, maxColumns: Int): Array[String] = {
+ try parser.parseNext()
+ catch {
+ case e: TextParsingException if isColumnOverflow(e, maxColumns) =>
+ throw new RuntimeException(columnOverflowMessage(maxColumns), e)
+ }
+ }
+
+ private[csv] def isColumnOverflow(e: TextParsingException, maxColumns: Int):
Boolean =
+ Option(e.getCause)
+ .collect { case aioobe: ArrayIndexOutOfBoundsException => aioobe }
+ .exists(aioobe => aioobeIndex(aioobe).forall(_ == maxColumns))
Review Comment:
`forall` returns true on `None`. If the JDK changes the AIOOBE message
format again, an unrelated AIOOBE would be reported as a column-overflow. Use
`.exists(_ == maxColumns)`.
##########
common/config/src/main/resources/default.conf:
##########
@@ -94,3 +94,8 @@ dataset {
multipart_upload_chunk_size_mib = 50
multipart_upload_chunk_size_mib = ${?DATASET_MULTIPART_UPLOAD_CHUNK_SIZE_MIB}
}
+
+csv {
Review Comment:
This block is not read anywhere — the default actually comes from the
hardcoded `DEFAULT_MAX_COLUMNS = 512` in `CSVScanSourceOpExec`, so
`CSV_PARSER_MAX_COLUMNS` has no effect. Either wire it up via a `CsvConfig`
object (like `DatasetConfig`) or remove the block.
Also, `csv.csv_parser_max_columns` repeats the namespace; should be
`csv.parser_max_columns` to match `dataset.multipart_upload_chunk_size_mib`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]