markap14 commented on code in PR #7894:
URL: https://github.com/apache/nifi/pull/7894#discussion_r1377927376


##########
nifi-python-extensions/nifi-text-embeddings-module/src/main/python/vectorstores/PutChroma.py:
##########
@@ -0,0 +1,125 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json
+
+from nifiapi.flowfiletransform import FlowFileTransform, 
FlowFileTransformResult
+from nifiapi.properties import PropertyDescriptor, StandardValidators, 
ExpressionLanguageScope
+import ChromaUtils
+import EmbeddingUtils
+
+
+class PutChroma(FlowFileTransform):
+    class Java:
+        implements = ['org.apache.nifi.python.processor.FlowFileTransform']
+
+    class ProcessorDetails:
+        version = '2.0.0-SNAPSHOT'
+        description = """Publishes JSON data to a Chroma VectorDB. The 
Incoming data must be in single JSON per Line format, each with two keys: 
'text' and 'metadata'.

Review Comment:
   In all Gen AI examples that I've come across, the structure is JSON and has 
a 'text' or similar property along with a 'metadata' property. So I tried to 
make this as intuitive as possible. This is also the format that is produced by 
the ParseDocument processor. Generally, I expect the flow to look like:
   `(some source) -> ParseDocument -> ChunkDocument -> PutChroma`
   And so this format keeps things simple. Unfortunately, we do not yet have 
the @UseCase and @MultiProcessorUseCase capabilities built out yet for Python 
based processors, so this hasn't been that clearly documented yet. But this 
will definitely be something we'll want to highlight.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to