syed72 opened a new issue #6864: URL: https://github.com/apache/incubator-pinot/issues/6864
Hello guys,Data Ingestion not working for data with JSON data types. No segments getting created Followed the below steps in StackOverflow. https://stackoverflow.com/questions/65886253/pinot-nested-json-ingestion Even examples given for JSON data types in build also not working (githubEvents)https://github.com/apache/incubator-pinot/tree/master/pinot-tools/src/main/resources/examples/batch/githubEventsSchema file: { "metricFieldSpecs": [], "dimensionFieldSpecs": [ { "dataType": "STRING", "name": "name" }, { "dataType": "LONG", "name": "age" }, { "dataType": "STRING", "name": "subjects_str" }, { "dataType": "STRING", "name": "subjects_name", "singleValueField": false }, { "dataType": "STRING", "name": "subjects_grade", "singleValueField": false } ], "dateTimeFieldSpecs": [], "schemaName": "myTable" } Table Config: { "tableName": "myTable", "tableType": "OFFLINE", "segmentsConfig": { "segmentPushType": "APPEND", "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy", "schemaName": "myTable", "replication": "1" }, "tenants": {}, "tableIndexConfig": { "loadMode": "MMAP", "invertedIndexColumns": [], "noDictionaryColumns": [ "subjects_str" ], "jsonIndexColumns": [ "subjects_str" ] }, "metadata": { "customConfigs": {} }, "ingestionConfig": { "batchIngestionConfig": { "segmentIngestionType": "APPEND", "segmentIngestionFrequency": "DAILY", "batchConfigMaps": [], "segmentNameSpec": {}, "pushSpec": {} }, "transformConfigs": [ { "columnName": "subjects_str", "transformFunction": "jsonFormat(subjects)" }, { "columnName": "subjects_name", "transformFunction": "jsonPathArray(subjects, '$.[*].name')" }, { "columnName": "subjects_grade", "transformFunction": "jsonPathArray(subjects, '$.[*].grade')" } ] } }Data.json{"name":"Pete","age":24,"subjects":[{"name":"maths","grade":"A"},{"name":"maths","grade":"B--"}]} {"name":"Pete1","age":23,"subjects":[{"name":"maths","grade":"A+"},{"name":"maths","grade":"B--"}]} {"name":"Pete2","age":25,"subjects":[{"name":"maths","grade":"A++"},{"name":"maths","grade":"B--"}]} {"name":"Pete3","age":26,"subjects":[{"name":"maths","grade":"A+++"},{"name":"maths","grade":"B--"}]} please help me to rectify this issue.Ingestion Job output: (No error) bin/pinot-admin.sh LaunchDataIngestionJob -jobSpecFile /home/sas/apache-pinot-incubating-0.7.1-bin/examples/batch/jsontype/ingestionJobSpec.yaml SegmentGenerationJobSpec: !!org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec cleanUpOutputDir: false excludeFileNamePattern: null executionFrameworkSpec: {extraConfigs: null, name: standalone, segmentGenerationJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner, segmentMetadataPushJobRunnerClassName: null, segmentTarPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner, segmentUriPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner} includeFileNamePattern: glob:**/*.json inputDirURI: examples/batch/jsontype/rawdata jobType: SegmentCreationAndTarPush outputDirURI: examples/batch/jsontype/segments overwriteOutput: true pinotClusterSpecs: - {controllerURI: 'http://localhost:9000'} pinotFSSpecs: - {className: org.apache.pinot.spi.filesystem.LocalPinotFS, configs: null, scheme: file} pushJobSpec: {pushAttempts: 2, pushParallelism: 1, pushRetryIntervalMillis: 1000, segmentUriPrefix: null, segmentUriSuffix: null} recordReaderSpec: {className: org.apache.pinot.plugin.inputformat.json.JSONRecordReader, configClassName: null, configs: null, dataFormat: json} segmentCreationJobParallelism: 0 segmentNameGeneratorSpec: null tableSpec: {schemaURI: 'http://localhost:9000/tables/myTable/schema', tableConfigURI: 'http://localhost:9000/tables/myTable', tableName: myTable} tlsSpec: nullTrying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner Creating an executor service with 1 threads(Job parallelism: 0, available cores: 40.) Initializing PinotFS for scheme file, classname org.apache.pinot.spi.filesystem.LocalPinotFS Submitting one Segment Generation Task for file:/home/sas/apache-pinot-incubating-0.7.1-bin/examples/batch/jsontype/rawdata/test.json Initialized FunctionRegistry with 119 functions: [fromepochminutesbucket, arrayunionint, codepoint, mod, sha256, year, yearofweek, upper, arraycontainsstring, arraydistinctstring, bytestohex, tojsonmapstr, trim, timezoneminute, sqrt, togeometry, normalize, fromepochdays, arraydistinctint, exp, jsonpathlong, yow, toepochhoursrounded, lower, toutf8, concat, ceil, todatetime, jsonpathstring, substr, dayofyear, contains, jsonpatharray, arrayindexofint, fromepochhoursbucket, arrayindexofstring, minus, arrayunionstring, toepochhours, toepochdaysrounded, millisecond, fromepochhours, arrayreversestring, dow, doy, min, toepochsecondsrounded, strpos, jsonpath, tosphericalgeography, fromepochsecondsbucket, max, reverse, hammingdistance, stpoint, abs, timezonehour, toepochseconds, arrayconcatint, quarter, md5, ln, toepochminutes, arraysortstring, replace, strrpos, jsonpathdouble, stastext, second, arraysortint, split, fromepochdaysbucket, lpad, day, toepochminutesrounded, fromdatetime, fromep ochseconds, arrayconcatstring, base64encode, ltrim, arraysliceint, chr, sha, plus, base64decode, month, arraycontainsint, toepochminutesbucket, startswith, week, jsonformat, sha512, arrayslicestring, fromepochminutes, remove, dayofmonth, times, hour, rpad, arrayremovestring, now, divide, bigdecimaltobytes, floor, toepochsecondsbucket, toepochdaysbucket, hextobytes, rtrim, length, toepochhoursbucket, bytestobigdecimal, toepochdays, arrayreverseint, datetrunc, minute, round, dayofweek, arrayremoveint, weekofyear] in 942ms Using class: org.apache.pinot.plugin.inputformat.json.JSONRecordReader to read segment, ignoring configured file format: AVRO Finished building StatsCollector! Collected stats for 4 documents Using fixed length dictionary for column: subjects_grade, size: 20 Created dictionary for STRING column: subjects_grade with cardinality: 5, max length in bytes: 4, range: A to B-- Using fixed length dictionary for column: subjects_name, size: 5 Created dictionary for STRING column: subjects_name with cardinality: 1, max length in bytes: 5, range: maths to maths Using fixed length dictionary for column: name, size: 20 Created dictionary for STRING column: name with cardinality: 4, max length in bytes: 5, range: Pete to Pete3 Created dictionary for LONG column: age with cardinality: 4, range: 23 to 26 Start building IndexCreator! Finished records indexing in IndexCreator! Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner Initializing PinotFS for scheme file, classname org.apache.pinot.spi.filesystem.LocalPinotFS Start pushing segments: []... to locations: [org.apache.pinot.spi.ingestion.batch.spec.PinotClusterSpec@4e31276e] for table myTable. Reference: https://apache-pinot.slack.com/archives/C011C9JHN7R/p1619532619119600 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
