kunwp1 opened a new issue, #4588: URL: https://github.com/apache/texera/issues/4588
Dataset (Too large to upload in github): https://texera.eye.som.uci.edu/dashboard/hub/dataset/result/detail/6 Model: Claude-Haiku-4-5 Issue: The agent couldn't create a workflow that reads the dataset using the following prompt and ends up getting a `litellm.RateLimitError: AnthropicException`. ``` # Dataset 1. TexeraChatbot_testdata_DDX41.txt.gz This TexeraChatbot_testdata_DDX41.txt.gz file includes a cell-by-gene raw count matrix, comprising 15,307 single cells (in columns) and 33,696 features (gene symbols, in rows). The first row contains cell barcodes, and the first column contains gene symbols. 2. TexeraChatbot_testdata_DDX41_obs[.txt.gz](http://.txt.gz/) This TexeraChatbot_testdata_DDX41_obs.txt.gz file includes cell-level metadata for cell barcodes. The column “barcode” is the unique identifier for each cell. Other columns are described below: - nCount_RNA: total UMI counts per cell - nFeature_RNA: total number of detected features per cell - percent.mt: percentage of mitochondrial reads per cell - pANN: proportion of artificial nearest neighbors calculated by DoubletFinder - nuclear_fraction: nuclear fraction score, capturing the proportion of reads derived from intronic regions; calculated using the DropletQC R package - sampleid: 2 unique sample IDs, i.e., DDX41 for DDX41 cKO mouse and WT for wild-type mouse. The genotype for the conditional knockout mouse is Ddx41 fl/fl; ChxCre, and the genotype for the wild-type mouse is Ddx41fl/fl. - majorclass: 12 annotated major cell classes, including AC, BC, Cone, HC, MG, Microglia, RGC, Rod, Endothelial, Pericyte, RPE, and Astrocyte - celltype: high-resolution cell type annotation In summary, the dataset comprises 15,307 single cells derived from 2 unique sample IDs, annotated into 12 major cell classes. 3. TexeraChatbot_testdata_DDX41_var.txt.gz This TexeraChatbot_testdata_DDX41_var.txt.gz file includes the gene features for the single-cell dataset. The “symbol” column contains the gene symbols for the 33,696 features, including both protein-coding and non-coding genes. Gene identifiers are gene symbols, and the RNA genome build used is the mouse reference (GRCm39). ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
