Re: [PR] Add per-operator reference pages under reference/operators [incubator-texera-site]

via GitHub Thu, 16 Apr 2026 18:41:04 -0700


Copilot commented on code in PR #34:
URL: 
https://github.com/apache/incubator-texera-site/pull/34#discussion_r3097410970



##########
content/en/docs/reference/operators/visualization/basic/hierarchy-chart.md:
##########
@@ -0,0 +1,24 @@
+---
+title: "Hierarchy Chart"
+description: "Visualize data in hierarchy"
+category: "Basic"
+operator_type: "HierarchyChart"
+tags: [visualization, basic]
+---
+
+[Home](../../../) > [Visualization](../../) > [Basic](../)
+
+### Input Properties
+
+| Property | Requirement | Type | Default | Description |
+|----------|-------------|------|---------|-------------|
+| Chart Type | ✓ | `treemap`, `sunburst` | - | Treemap or Sunburst |
+| Hierarchy Path | ✓ | List<Hierarchy Section> | - | Hierarchy of attributes 
from a higher-level<br>category to lower-level category |
+| ↳ Attribute Name | ✓ | Column | - |  |

Review Comment:
   Same Markdown/HTML issue as elsewhere: `List<Hierarchy Section>` contains 
angle brackets (and even a space), so it will not render correctly in HTML. 
Please escape the angle brackets or wrap the entire type cell in backticks (and 
consider fixing this in the generator).



##########
content/en/docs/reference/operators/machine-learning/machine-learning-general/machine-learning-scorer.md:
##########
@@ -0,0 +1,25 @@
+---
+title: "Machine Learning Scorer"
+description: "Scorer for machine learning models"
+category: "Machine Learning General"
+operator_type: "Scorer"
+tags: [machine-learning, machine-learning-general]
+---
+
+[Home](../../../) > [Machine Learning](../../) > [Machine Learning 
General](../)
+
+### Input Properties
+
+| Property | Requirement | Type | Default | Description |
+|----------|-------------|------|---------|-------------|
+| Regression | ✓ | Boolean | false | Choose to solve a regression task |
+| Actual Value | ✓ | Column | - | Specify the label attribute |
+| Predicted Value | ✓ | Column | - | Specify the attribute generated by the 
model |
+| Scorer Functions |  | List | - | Select classification tasks metrics |
+| Scorer Functions |  | List | - | Select regression tasks metrics |
+

Review Comment:
   The "Scorer Functions" input property appears twice with different 
descriptions (classification vs regression). Duplicate property names in the 
table are ambiguous; please rename the rows to distinct labels or consolidate 
into a single row that explains it is task-dependent.



##########
content/en/docs/reference/operators/machine-learning/sklearn/sklearn-prediction.md:
##########
@@ -0,0 +1,23 @@
+---
+title: "Sklearn Prediction"
+description: "Skleanr Prediction Operator"
+category: "Sklearn"

Review Comment:
   Typo in the description ("Skleanr"). This appears both in front matter and 
in the Sklearn index, so fixing it will improve search snippets and the 
category listing.



##########
content/en/docs/reference/operators/user-defined-functions/r/r-udf.md:
##########
@@ -0,0 +1,44 @@
+---
+title: "R UDF"
+description: "User-defined function operator in R script"
+category: "R"
+operator_type: "RUDF"
+tags: [user-defined-functions, r]
+---
+
+[Home](../../../) > [User Defined Functions](../../) > [R](../)
+
+### Input Properties
+
+| Property | Requirement | Type | Default | Description |
+|----------|-------------|------|---------|-------------|
+| R UDF Script | ✓ | Code (r) | `See template below` | Input your code here |
+| Worker count | ✓ | Integer | 1 | Specify how many parallel workers to lunch |
+| Use Tuple API? | ✓ | Boolean | false | Check this box to use Tuple API, 
leave unchecked<br>to use Table API |

Review Comment:
   Typo in the Worker count description: "lunch" should be "launch" (parallel 
workers to launch).



##########
content/en/docs/reference/operators/visualization/media/_index.md:
##########
@@ -0,0 +1,20 @@
+---
+title: "Media"
+description: "Operators in the Media category"
+weight: 5
+categories: [Operators]
+tags: [visualization, media]
+---
+
+[Home](../../) > [Visualization](../) > Media
+
+## Operators
+
+| Operator | Description |
+|----------|-------------|
+| [HTML Visualizer](html-visualizer/) | Render the result of HTML content |
+| [Image Visualizer](image-visualizer/) | visualize image content |
+| [URL Visualizer](url-visualizer/) | Render the content of URL |
+| [Word Cloud](word-cloud/) | Generate word cloud for   texts |

Review Comment:
   This table row repeats the same extra whitespace in the operator description 
("Generate word cloud for   texts"). It should be cleaned up so the category 
index reads correctly.



##########
content/en/docs/reference/operators/machine-learning/sklearn/_index.md:
##########
@@ -0,0 +1,48 @@
+---
+title: "Sklearn"
+description: "Operators in the Sklearn category"
+weight: 1
+categories: [Operators]
+tags: [machine-learning, sklearn]
+---
+
+[Home](../../) > [Machine Learning](../) > Sklearn
+
+## Subcategories
+
+- [Sklearn Training](sklearn-training/)
+
+## Operators
+
+| Operator | Description |
+|----------|-------------|
+| [Adaptive Boosting](adaptive-boosting/) | Sklearn Adaptive Boosting Operator 
|
+| [Bagging](bagging/) | Sklearn Bagging Operator |
+| [Bernoulli Naive Bayes](bernoulli-naive-bayes/) | Sklearn Bernoulli Naive 
Bayes Operator |
+| [Complement Naive Bayes](complement-naive-bayes/) | Sklearn Complement Naive 
Bayes Operator |
+| [Decision Tree](decision-tree/) | Sklearn Decision Tree Operator |
+| [Dummy Classifier](dummy-classifier/) | Sklearn Dummy Classifier Operator |
+| [Extra Tree](extra-tree/) | Sklearn Extra Tree Operator |
+| [Extra Trees](extra-trees/) | Sklearn Extra Trees Operator |
+| [Gaussian Naive Bayes](gaussian-naive-bayes/) | Sklearn Gaussian Naive Bayes 
Operator |
+| [Gradient Boosting](gradient-boosting/) | Sklearn Gradient Boosting Operator 
|
+| [K-nearest Neighbors](k-nearest-neighbors/) | Sklearn K-nearest Neighbors 
Operator |
+| [Linear Perceptron](linear-perceptron/) | Sklearn Linear Perceptron Operator 
|
+| [Linear Regression](linear-regression/) | Sklearn Linear Regression Operator 
|
+| [Linear Support Vector Machine](linear-support-vector-machine/) | Sklearn 
Linear Support Vector Machine Operator |
+| [Logistic Regression](logistic-regression/) | Sklearn Logistic Regression 
Operator |
+| [Logistic Regression Cross 
Validation](logistic-regression-cross-validation/) | Sklearn Logistic 
Regression Cross Validation Operator |
+| [Multi-layer Perceptron](multi-layer-perceptron/) | Sklearn Multi-layer 
Perceptron Operator |
+| [Multinomial Naive Bayes](multinomial-naive-bayes/) | Sklearn Multinomial 
Naive Bayes Operator |
+| [Nearest Centroid](nearest-centroid/) | Sklearn Nearest Centroid Operator |
+| [Passive Aggressive](passive-aggressive/) | Sklearn Passive Aggressive 
Operator |
+| [Probability Calibration](probability-calibration/) | Sklearn Probability 
Calibration Operator |
+| [Random Forest](random-forest/) | Sklearn Random Forest Operator |
+| [Ridge Regression](ridge-regression/) | Sklearn Ridge Regression Operator |
+| [Ridge Regression Cross Validation](ridge-regression-cross-validation/) | 
Sklearn Ridge Regression Cross Validation Operator |
+| [Sklearn Prediction](sklearn-prediction/) | Skleanr Prediction Operator |
+| [Sklearn Testing](sklearn-testing/) | It will generate scorers for Sklearn 
model |

Review Comment:
   The Sklearn category index repeats the typo "Skleanr" in the operator 
description. Please correct it to "Sklearn" so the listing reads properly.



##########
content/en/docs/reference/operators/user-defined-functions/java/java-udf.md:
##########
@@ -0,0 +1,48 @@
+---
+title: "Java UDF"
+description: "User-defined function operator in Java script"
+category: "Java"
+operator_type: "JavaUDF"
+tags: [user-defined-functions, java]
+---
+
+[Home](../../../) > [User Defined Functions](../../) > [Java](../)
+
+### Input Properties
+
+| Property | Requirement | Type | Default | Description |
+|----------|-------------|------|---------|-------------|
+| Java UDF script | ✓ | Code (java) | `See template below` | Input your code 
here |
+| Worker count | ✓ | Integer | 1 | Specify how many parallel workers to lunch |
+| Retain input columns | ✓ | Boolean | true | Keep the original input columns? 
|
+| Extra output column(s) |  | List<Attribute> | - | Name of the newly added 
output columns that the<br>UDF will produce, if any |

Review Comment:
   Typo in the Worker count description: "lunch" should be "launch" (parallel 
workers to launch).



##########
content/en/docs/reference/operators/data-cleaning/filter.md:
##########
@@ -0,0 +1,24 @@
+---
+title: "Filter"
+description: "Performs a filter operation using OR between multiple predicates"
+category: "Data Cleaning"
+operator_type: "Filter"
+tags: [data-cleaning]
+---
+
+[Home](../../) > [Data Cleaning](../)
+
+### Input Properties
+
+| Property | Requirement | Type | Default | Description |
+|----------|-------------|------|---------|-------------|
+| Predicates | ✓ | List<Filter Predicate> | - | Multiple predicates in OR |

Review Comment:
   The type value `List<Filter Predicate>` contains angle brackets, which 
Markdown treats as HTML. This will typically render as just `List` (or 
otherwise lose the generic type) in the generated site. Wrap the type in 
backticks or escape `<`/`>` so the full type shows up reliably (and apply the 
same fix across other generated `List<...>` / `Map<...>` types).
   ```suggestion
   | Predicates | ✓ | `List<Filter Predicate>` | - | Multiple predicates in OR |
   ```



##########
content/en/docs/reference/operators/data-input/_index.md:
##########
@@ -0,0 +1,22 @@
+---
+title: "Data Input"
+description: "Operators in the Data Input category"
+weight: 1
+categories: [Operators]
+tags: [data-input]
+---
+
+[Home](../) > Data Input
+
+## Operators
+
+| Operator | Description |
+|----------|-------------|
+| [ File Scan](file-scan/) | Scan data from a  file |
+| [Arrow File Scan](arrow-file-scan/) | Scan data from a Arrow file |

Review Comment:
   The Data Input index lists " File Scan" with a leading space and repeats the 
double-space description. Once the operator page is fixed, this entry should be 
updated to match so the index renders cleanly.
   ```suggestion
   | [File Scan](file-scan/) | Scan data from a file |
   | [Arrow File Scan](arrow-file-scan/) | Scan data from an Arrow file |
   ```



##########
content/en/docs/reference/operators/visualization/basic/range-slider.md:
##########
@@ -0,0 +1,23 @@
+---
+title: "Range Slider"
+description: "Visualize data in a Range Slider"
+category: "Basic"
+operator_type: "RangeSlider"
+tags: [visualization, basic]
+---
+
+[Home](../../../) > [Visualization](../../) > [Basic](../)
+
+### Input Properties
+
+| Property | Requirement | Type | Default | Description |
+|----------|-------------|------|---------|-------------|
+| Y-axis | ✓ | Column | - | The name of the column to represent y-axis |
+| X-axis | ✓ | Column | - | The name of the column to represent the x-axis |
+| Handle Duplicates |  | `Nothing`, `Mean`, `Sum` | NOTHING | How to handle 
duplicate values in y-axis |
+

Review Comment:
   The default value ("NOTHING") does not match any of the enumerated options 
("Nothing", "Mean", "Sum"). Please make the default exactly match one of the 
displayed options (or adjust the option casing) to avoid confusing users.



##########
content/en/docs/reference/operators/visualization/media/word-cloud.md:
##########
@@ -0,0 +1,22 @@
+---
+title: "Word Cloud"
+description: "Generate word cloud for   texts"
+category: "Media"
+operator_type: "WordCloud"

Review Comment:
   The description contains extra whitespace ("Generate word cloud for   
texts"), which reads like a generation bug and will show up in listings/search. 
Please normalize it (e.g., single spaces).



##########
content/en/docs/reference/operators/data-input/file-scan.md:
##########
@@ -0,0 +1,26 @@
+---
+title: " File Scan"
+description: "Scan data from a  file"

Review Comment:
   The front matter has a leading space in the title (" File Scan"), and the 
description has double spaces ("Scan data from a  file"). This will affect the 
rendered page title and any autogenerated nav/TOC entries.
   ```suggestion
   title: "File Scan"
   description: "Scan data from a file"
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Add per-operator reference pages under reference/operators [incubator-texera-site]

Reply via email to